This question already has answers here:
What does "select((select(s),$|=1)[0])" do in Perl?
(7 answers)
Closed 4 years ago.
What is the [0] doing in this code:
select((select(LOG_FILE),$!=1)[0]);
UPDATE: I answered this ten years ago! What does “select((select(s),$|=1)[0])” do in Perl?
You're looking at a single element access to a list. The expression in side the parentheses produces some sort of list and the [0] selects one item from the list.
This bit of code is a very old idiom to set a per-filehandle kinda-global variable. I think you probably meant $| (the autoflush setting) instead of $!.
First, remember that Perl has the concept of a "default filehandle". That starts out as standard output, but you can change it. That's what the select does.
Next, realize that each file handle knows its own settings for various things; these are represented by special variables such as $| (see perlvar's section on "Variables related to Filehandles"). When you change these variables, they apply to the current default filehandle.
So, what you see in this idiom is an inner select that changes the default filehandle. You change the default then set $| to whatever value you want. It looks a bit odd because you have two expressions separated by a comma instead of a semicolon, the use statement separator:
(select(LOG_FILE), $|=1)
From this, the idiom wants the result of the select; that's the previous default filehandle. To get that you want the first item in that list. That's in index 0:
(select(LOG_FILE), $|=1)[0]
The result of that entire expression is the previous default filehandle, which you now want to restore. Do that with the outer select:
select((select(LOG_FILE), $|=1)[0]);
You could have written that with an intermediate variable:
my $previous = select LOG_FILE;
$| = 1;
select($previous);
If you're writing new stuff on your own, you might use scalar variable for the filehandle then call its autoflush method:
open my $log_file_fh, '>', $log_filename or die ...;
$log_file_fh->autoflush(1);
( LIST1 )[ LIST2 ] is a list slice. In list context, it evaluates to the elements of LIST1 specified by LIST2.
In this case, it returns the result of the select.
select((select(LOG_FILE),$!=1)[0]);
should be
select((select(LOG_FILE),$|=1)[0]);
The latter enables auto-flushing for the LOG_FILE file handle. It can be written more clearly as follows:
use IO::Handle (); # Only needed in older versions of Perl.
LOG_FILE->autoflush(1);
By the way, you shouldn't be using global variables like that. Instead of
open LOG_FILE, ...
you should be using
open my $LOG_FILE, ...
Related
Here I fixed most of my mistakes and thank you all, any other advice please with my hash at this point and how can I clear each word and puts the word and its frequency in a hash, excluding the empty words.. I think my code make since now.
So you can focus on the key part of the algorithm, how about accepting input on STDIN and output to STDOUT. That way there's no argument checking, etc. Just a simple:
$ prog < words.txt
All you really need is a very simple algorithm:
Read a line
Split it into words
Record a count of the word
When done, display the counts
Here's a sample program
#! /usr/bin/perl -w
use strict;
my (%data);
while (<STDIN>) {
chomp;
my(#words) = split(/\s+/);
foreach my $word (#words) {
if (!defined($data{$word})) {
$data{$word} = 0;
}
$data{$word}++;
}
}
foreach (sort(keys(%data))) {
print "$_: $data{$_}\n";
}
Once you understand this and have it working in your environment, you can extend it to meet your other requirements:
remove non-alphabetic characters from each word
print three results per line
use input and output files
put the algorithm into a subroutine
I agree that starting with dave's answer would be more productive, but if you are interested in your mistakes, here is what I see:
You assign the return value of checkArgs to a scalar variable $checkArgs, but return an array value. It means that $checkArgs will always contain 2 (the size of the array) after this call (because the program dies if the number of arguments is not 2). It is not very bad since you do not use the value later, but why you need it at all in this case?
You open files and close them immediately without reading from them. Does not make sense.
Statement
while (<>)
reads either from standard output or from all files in the command line arguments. The latter variant is like what you want, but your second argument is the output file, not input. The diamond operator will try to read from it too. You have two options: a) use only one file name in the command line arguments, read the file with <>, use standard output for output, and redirect output to a file in shell; b) use
while(<$file1>)
instead, of course, before closing files. Option a) is the traditional Unix- and Perl-style, but b) provides for clearer code for beginners.
Statements
return $word;
and
return $str, $hash{$str};
return corresponding values on the first iterations of the loops, all other data remain unprocessed. In the first case, you should create a local array, store all $word in it and return the array as a whole. In the second case, you already have such local %hash, it is enough to return this hash. In both cases, you need should assign the return values of the functions not to scalars, but to an array and a hash correspondingly. Now, you actually lose all you data.
I have kind of fundamental question about scalars in Perl. Everything I read says scalars hold one value:
A scalar may contain one single value in any of three different
flavors: a number, a string, or a reference. Although a scalar may not
directly hold multiple values, it may contain a reference to an array
or hash which in turn contains multiple values.
--from perldoc
Was curious how the code below works
open( $IN, "<", "phonebook.txt" )
or die "Cannot open the file\n";
while ( my $line = <$IN> ) {
chomp($line);
my ( $name, $area, $phone ) = split /\|/, $line;
print "$name $phone $phone\n";
}
close $IN;
Just to clarify the code above is opening a pipe delimited text file in the following format name|areacode|phone
It opens the file up and then it splits them into $name $area $phone; how does it go through the multiple lines of the file and print them out?
Going back to the perldoc quote from above "A scalar may contain a single value of a string, number, reference." I am assuming that it has to be a reference, but doesn't even really seem like a reference and if it is looks like it would a reference of a scalar? so I am wondering what is going on internally that allows Perl to iterate through all of the lines in the code?
Nothing urgent, just something I noticed and was curious about. Thanks.
It looks like Borodin zeroed in on the part you wanted, but I'll add to it.
There are variables, which store things for us, and there are operators, which do things for us. A file handle, the thing you have in $IN, isn't the file itself or the data in the file. It's a connection that the program to use to get information from the file.
When you use the line input operator, <>, you give it a file handle to tell it where to grab the next line from. By itself, it defaults to ARGV, but you can put any file handle in there. In this case, you have <$IN>. Borodin already explained the reference and bareword stuff.
So, when you use the line input operator, it look at the connection you give in then gets a line from that file and returns it. You might be able to grok this more easily with it's function form:
my $line = readline( $IN );
The thing you get back doesn't come out of $IN, but the thing it points to. Along the way, $IN keeps track of where it is in the file. See seek and tell.
Along the same lines are Perl's regexes. Many people call something like /foo.*bar/ a regular expression. They are slightly wrong. There's a regular expression inside the pattern match operator //. The pattern is the instructions, but it doesn't do anything by itself until the operator uses it.
I find in my classes if I emphasize the difference between the noun and verb parts of the syntax, people have a much easier time with this sort of stuff.
Old Answer
Through each iteration of the while loop, exactly one value is put into the scalar variables. When the loop is done with a line, everything is reset.
The value in $line is a single value: the entire line which you have not broken up yet. Perl doesn't care what that single value looks like. With each iteration, you deal with exactly one line and that's what's in $line. Remember, these are variables, which means you can modify and replace their values, so they can only hold one thing at a time, but there can be multiple times.
The scalars $name, $area, and $phone have single values, each produced by split. Those are lexical variables (my), so they are only visible inside the specific loop iteration where they are defined.
Beyond that, I'm not sure which scalar you might be confused about.
The old-fashioned way of opening files is to use a bare name for the file handle, like so
open IN, 'phonebook.txt'
A file handle is a special type of value, like scalar, hash, array etc. but it has no prefix symbol to differentiate it. (This isn't actually the full extent of the truth, but I am worried about confusing you if I add even more detail.)
Perl still works like this, but it is best avoided for a couple of reasons.
All such file handles are global, and there is no way to restrict access to them by scope
There is no way to pass the value to a subroutine or store it in a data structure
So Perl was enhanced several years ago so that you can use references to file handles. These can be stored in scalar variables, arrays, or hashes, and can be passed as subroutine parameters.
What happens now when you write
open my $in, '<', 'phonebook.txt'
is that perl autovivifies an anonymous file handle, and puts a reference to it in variable $in, so yes, you were right, it is a reference. (Another thing that was changed about the same time was the move to three-parameter open calls, which allow you to open a file called, say, >.txt for input.)
I hope that helps you to understand. It's an unnecessary level of detail, but it can often help you to remember the way Perl works to understand the underlying details.
Incidentally, it is best to keep to lower-case letters for lexical variables, even for file handle references. I often add fh to the end to indicate that the variable holds a file handle, like $in_fh. But there's no need to use capitals, which are generally reserved for global variables like Package::Names.
Update - The Rest of the Story
I thought I should add something to explain what I have mised out, for fear of misleading people who care about the gory detail.
Perl keeps a symbol table hash - a stash - that work very like ordinary Perl hashes. There is one such stash for each package, including the default package main. Note that this hash nothing to do with lexical variables - declared with my - which are stored entirely separately.
Ther indexes for the stashes are the names of the package variables, without the initial symbol. So, for example, if you have
our $val;
our #val;
our %val;
then the stash will have only a single element, with a key of val and a value which is a reference to an intermediate structure called a typeglob. This is another hash structure, with one element for each different type of variable that has been declared. In this case our val typeglob will have three elements, for the scalar, array, and hash varieties of the val variables.
One of these elements may also be an IO variable type, which is where file handles are kept. But, for historical reasons, the value that is passed around as a file handle is in fact a reference to the typeglob that contains it. That is why, if you write open my $in, '<', 'phonebook.txt' and then print $in you will see something like GLOB(0x269581c) - the GLOB being short for typeglob.
Apart from that, the account above is accurate. Perl autovivifies an anonymous typeglob in the current package, and uses only its IO slot for the file handle.
Scalars in Perl are denoted by a $ and they can indeed contain the type of values you mention in your questions but next to that they can also contain a file handle. You can create file handles in Perl in two ways one way is Lexical
open my $filehandle, '>', '/path/to/file' or die $!;
and the other is global
open FILEHANDLE, '>', '/path/to/file' or die $!;
You should use the Lexical version which is what you're doing.
The while loop in your code uses the <> operator on your lexical filehandle which returns a line out of your file every time it's called, until it's out of lines (when End Of File is reached) in which case it returns false.
I went into a bit more detail on file handles as it seems it's a concept you're not completely clear on.
I apologize if this question sounds simple, my intention is to understand in depth how this (these?) particular operator(s) works and I was unable to find a satisfactory description in the perldocs (It probably exists somewhere, I just couldn't find it for the life of me)
Particularly, I am interested in knowing if
a) <>
b) <*> or whatever glob and
c) <FH>
are fundamentally similar or different, and how they are used internally.
I built my own testing functions to gain some insight on this (presented below). I still don't have a full understanding (my understanding might even be wrong) but this is what I've concluded:
<>
In Scalar Context: Reads the next line of the "current file" being read (provided in #ARGV). Questions: This seems like a very particular scenario, and I wonder why it is the way it is and whether it can be generalized or not. Also what is the "current file" that is being read? Is it in a file handle? What is the counter?
In List Context: Reads ALL of the files in #ARGV into an array
<list of globs>
In Scalar Context: Name of the first file found in current folder that matches the glob. Questions: Why the current folder? How do I change this? Is the only way to change this doing something like < /home/* > ?
In List Context: All the files that match the glob in the current folder.
<FH> just seems to return undef when assigned to a variable.
Questions: Why is it undef? Does it not have a type? Does this behave similarly when the FH is not a bareword filehandle?
General Question: What is it that handles the value of <> and the others during execution? In scalar context, is any sort of reference returned, or are the variables that we assign them to, at that point identical to any other non-ref scalar?
I also noticed that even though I am assigning them in sequence, the output is reset each time. i.e. I would have assumed that when I do
$thing_s = <>;
#thing_l = <>;
#thing_l would be missing the first item, since it was already received by $thing_s. Why is this not the case?
Code used for testing:
use strict;
use warnings;
use Switch;
use Data::Dumper;
die "Call with a list of files\n" if (#ARGV<1);
my #whats = ('<>','<* .*>','<FH>');
my $thing_s;
my #thing_l;
for my $what(#whats){
switch($what){
case('<>'){
$thing_s = <>;
#thing_l = <>;
}
case('<* .*>'){
$thing_s = <* .*>;
#thing_l = <* .*>;
}
case('<FH>'){
open FH, '<', $ARGV[0];
$thing_s = <FH>;
#thing_l = <FH>;
}
}
print "$what in scalar context is: \n".Dumper($thing_s)."\n";
print "$what in list context is: \n".Dumper(#thing_l)."\n";
}
The <> thingies are all iterators. All of these variants have common behaviour:
Used in list context, all remaining elements are returned.
Used in scalar context, only the next element is returned.
Used in scalar context, it returns undef once the iterator is exhausted.
These last two properties make it suitable for use as a condition in while loops.
There are two kinds of iterators that can be used with <>:
Filehandles. In this case <$fh> is equivalent to readline $fh.
Globs, so <* .*> is equivalent to glob '* .*'.
The <> is parsed as a readline when it contains either nothing, a bareword, or a simple scalar. More complex expression can be embedded like <{ ... }>.
It is parsed as a glob in all other cases. This can be made explicit by using quotes: <"* .*"> but you should really be explicit and use the glob function instead.
Some details differ, e.g. where the iterator state is kept:
When reading from a file handle, the file handle holds that iterator state.
When using the glob form, each glob expression has its own state.
Another part is if the iterator can restart:
glob restarts after returning one undef.
filehandles can only be restarted by seeking – not all FHs support this operation.
If no file handle is used in <>, then this defaults to the special ARGV file handle. The behaviour of <ARGV> is as follows:
If #ARGV is empty, then ARGV is STDIN.
Otherwise, the elements of #ARGV are treated as file names. The following pseudocode is executed:
$ARGV = shift #ARGV;
open ARGV, $ARGV or die ...; # careful! no open mode is used
The $ARGV scalar holds the filename, and the ARGV file handle holds that file handle.
When ARGV would be eof, the next file from #ARGV is opened.
Only when #ARGV is completely empty can <> return undef.
This can actually be used as a trick to read from many files:
local #ARGV = qw(foo.txt bar.txt baz.txt);
while (<>) {
...;
}
What is it that handles the value of <> and the others during execution?
The Perl compiler is very context-aware, and often has to choose between multiple ambiguous interpretations of a code segment. It will compile <> as a call to readline or to glob depending on what is inside the brackets.
In scalar context, is any sort of reference returned, or are the variables that we assign them to, at that point identical to any other non-ref scalar?
I'm not sure what you're asking here, or why you think the variables that take the result of a <> should be any different from other variables. They are always simple string values: either a filename returned by glob, or some file data returned by readline.
<FH> just seems to return undef when assigned to a variable. Questions: Why is it undef? Does it not have a type? Does this behave similarly when the FH is not a bareword filehandle?
This form will treat FH as a filehandle, and return the next line of data from the file if it is open and not at eof. Otherwise undef is returned, to indicate that nothing valid could be read. Perl is very flexible with types, but undef behaves as its own type, like Ruby's nil. The operator behaves the same whether FH is a global file handle or a (variable that contains) a reference to a typeglob.
I've following two statements written in perl :
#m1 = ( [1,2,3],[4,5,6],[7,8,9] ); # It is an array of references.
$mr = [ [1,2,3],[4,5,6],[7,8,9] ]; # It is an anonymous array. $mr holds reference.
When I try to print:
print "$m1[0][1]\n"; # this statement outputs: 2; that is expected.
print "$mr->[0][1]\n"; #this statement outputs: 2; that is expected.
print "$mr[0][1]\n"; #this statement doesn't output anything.
I feel second and third print statements are same. However, I didn't any output with third print statement.
Can anyone let me know what is wrong with third print statement?
This is simple. $mr is a reference. So you use the Arrow Operator to dereference.
Also, if you would use use warnings; use strict;, you would have received a somewhat obvious error message:
Global symbol "#mr" requires explicit package name
$mr is a scalar variable whose value is a reference to a list. It is not a list, and it can't be used as if it was a list. The arrow is needed to access the list it refers to.
But hold on, $m1[0] is also not a list, but a reference to one. You may be wondering why you don't have to write an arrow between the indexes, like $m1[0]->[1]. There's a special rule that says you can omit the arrow when accessing list or hash elements in a list or hash of references, so you can write $mr->[0][1] instead of $mr->[0]->[1] and $m1[0][1] instead of $m1[0]->[1].
$mr holds a reference (conceptually similar to the address of a variable in compiled languages). thus you have an extra level of indirection. replace $mrwith $$mr and you'll be fine.
btw, you can easily check questions like these by browsing for tutorials on perldoc.
You said:
print "$m1[0][1]\n"; # this statement outputs: 2; that is expected.
print "$mr[0][1]\n"; #this statement doesn't output anything.
Notice how you used the same syntax both times.
As you've established by this first line, this syntax accesses the array named: #m1 and #mr. You have no variable named #mr, so you get undef for $mr[0][1].
Maybe you don't realizes that scalar $mr and array #mr have no relation to each other.
Please use use strict; use warnings; to avoid these and many other errors.
I've seen some horrific code written in Perl, but I can't make head nor tail of this one:
select((select(s),$|=1)[0])
It's in some networking code that we use to communicate with a server and I assume it's something to do with buffering (since it sets $|).
But I can't figure out why there's multiple select calls or the array reference. Can anyone help me out?
It's a nasty little idiom for setting autoflush on a filehandle other than STDOUT.
select() takes the supplied filehandle and (basically) replaces STDOUT with it, and it returns the old filehandle when it's done.
So (select($s),$|=1) redirects the filehandle (remember select returns the old one), and sets autoflush ($| = 1). It does this in a list ((...)[0]) and returns the first value (which is the result of the select call - the original STDOUT), and then passes that back into another select to reinstate the original STDOUT filehandle. Phew.
But now you understand it (well, maybe ;)), do this instead:
use IO::Handle;
$fh->autoflush;
The way to figure out any code is to pick it apart. You know that stuff inside parentheses happens before stuff outside. This is the same way you'd figuring out what code is doing in other languages.
The first bit is then:
( select(s), $|=1 )
That list has two elements, which are the results of two operations: one to select the s filehandle as the default then one to set $| to a true value. The $| is one of the per-filehandle variables which only apply to the currently selected filehandle (see Understand global variables at The Effective Perler). In the end, you have a list of two items: the previous default filehandle (the result of select), and 1.
The next part is a literal list slice to pull out the item in index 0:
( PREVIOUS_DEFAULT, 1 )[0]
The result of that is the single item that is previous default filehandle.
The next part takes the result of the slice and uses it as the argument to another call to select
select( PREVIOUS_DEFAULT );
So, in effect, you've set $| on a filehandle and ended up back where you started with the default filehandle.
select($fh)
Select a new default file handle. See http://perldoc.perl.org/functions/select.html
(select($fh), $|=1)
Turn on autoflush. See http://perldoc.perl.org/perlvar.html
(select($fh), $|=1)[0]
Return the first value of this tuple.
select((select($fh), $|=1)[0])
select it, i.e. restore the old default file handle.
Equivalent to
$oldfh = select($fh);
$| = 1;
select($oldfh);
which means
use IO::Handle;
$fh->autoflush(1);
as demonstrated in the perldoc page.
In another venue, I once proposed that a more comprehensible version would be thus:
for ( select $fh ) { $| = 1; select $_ }
This preserves the compact idiom’s sole advantage that no variable needs be declared in the surrounding scope.
Or if you’re not comfortable with $_, you can write it like this:
for my $prevfh ( select $fh ) { $| = 1; select $prevfh }
The scope of $prevfh is limited to the for block. (But if you write Perl you really have no excuse to be skittish about $_.)
It's overly clever code for turning on buffer flushing on handle s and then re-selecting the current handle.
See perldoc -f select for more.
please check perldoc -f select. For the meaning of $|, please check perldoc perlvar
It is overoptimization to skip loading IO::Handle.
use IO::Handle;
$fh->autoflush(1);
is much more readable.