I am starting to learn Perl, so I am trying to read some posts here at SO. Now I came across this code https://stackoverflow.com/a/22310773/2173773 (simplified somewhat here) :
echo "1 2 3 4" | perl -lane'
$h{#F} ||= [];
print $_ for keys %h;
'
What does this code do, and why does this code print 4?
I have tried to study Perl references at http://perldoc.perl.org/perlreftut.html , but I still could not figure this out.
(I am puzzled about this line: $h{#F} ||= [].. )
The -n option (part of -lane) causes Perl to execute the given code for each individual line of input.
The -a option (when used with the -n or -p option) causes Perl to split every line of input on whitespace and store the fields in the #F variable.
$something ||= [] is equivalent to $something = $something || []; i.e., it assigns [] (a reference to an empty array) to the variable $something if & only if $something is already false or undefined.
$h{#F} is an element of the hash %h. Because this expression begins with $ (rather than #), the subscript #F is evaluated in scalar context, and scalar context for an array makes the array evaluate to its length. As the Perl code is only ever executed on the line 1 2 3 4, which is split into four elements, #F will only be four elements long, so $h{#F} is here equivalent to $h{4} (or, technically, $h{"4"}).
Thus, [] will be assigned to $h{"4"}, and as 4 is the only element of the hash %h in existence, keys %h will return a list containing only "4", and printing the elements of this list will print 4.
Related
Small debug question I can't solve for some reason. Consider the following code:
use warnings;
my $flag = 0;
foreach my $i (0..scalar(#ARGV)) {
$data{$OPTION} .= $ARGV[$i]." " if($flag);
$flag = 1 if($ARGV[$i] =~ /$OPTION/);
undef $ARGV[$i] if($flag);
}
I get the following two warnings:
Use of uninitialized value within #ARGV in concatenation (.) or string at line 4
Use of uninitialized value in pattern match (m//) at line 5
I get the reason is that I undefine some value of #ARGV and then it tries to check it.
The way I do it like this is because I would like to 'cut' some of the data of #ARGV before using GetOpt module (which uses this array).
How to solve it?
Let's expand on those comments a bit.
Imagine #ARGV contains four elements. They will have the indexes 0, 1, 2 and 3 (as arrays in Perl are zero-based).
And your loop looks like this:
foreach my $i (0..scalar(#ARGV)) {
You want to visit each element in #ARGV, so you use the range operator (..) to generate a list of all those indexes. But scalar #ARGV returns the number of elements in #ARGV and that's 4. So your range is 0 .. 4. And there's no value at $ARGV[4] - so you get an "undefined value" warning (as you're trying to read past the end of an array).
A better way to do this is to use $#ARGV instead of scalar #ARGV. For every array variable in Perl (say #foo) you also get a variable (called $#foo) which contains the last index number in the array. In our case, that's 3 and your range (0 .. $#ARGV) now contains the integers 0 .. 3 and you no longer try to read past the end of the array and you don't get the "undefined value" warnings.
There's one other improvement I would suggest. Inside your loop, you only ever use $i to access an element from #ARGV. It's only used in expressions like $ARGV[$i]. In this case, it's probably better to skip the middle man and to iterate across the elements in the array, not the indexes.
I mean you can write your code like this:
foreach my $arg (#ARGV) {
$data{$OPTION} .= $arg . " " if($flag);
$flag = 1 if($arg =~ /$OPTION/);
undef $arg if($flag);
}
I think that's a little easier to follow.
I am seeking explanation of the syntax of Perl's uniq and fidrstidx function from module MoreUtils.pm.
Having sought that, I already know other ways to get uniq array elements from an array having duplicate elements and finding the first index from an array by below ways :
## remove duplicate elements ##
my #arr = qw (2 4 2 8 3 4 6);
my #uniq = ();
my %hash = ();
#uniq = grep {!$hash{$_}++ } #arr;
### first index ###
#arr = qw (Java ooperl Ruby cgiperl Python);
my ($index) = grep {$arr[$_] =~ /perl/} 0..$#arr;
Can anybody please explain me second line of this below sub uniq function comprising map and ternary operator from MoreUtils.pm:
map {$h{$_}++ == 0 ? $_ : () } #;
and also
the &# passed to firstidx function and the below line in the body of the function :
local *_ = \$_[$i];
What I understand that sub routine ref is passed to firstidx. But a bit more detailed explanation will be much appreciated.
Thanks.
Your second question was answered in the comments.
Your first question asks about map {$h{$_}++ == 0 ? $_ : () } #; from List::MoreUtils. In recent versions, it's actually in List::MoreUtils::PP (for Pure Perl) since many of the subroutines are also implemented in C and XS. Here's the current version of the Pure Perl uniq:
sub uniq (#)
{
my %seen = ();
my $k;
my $seen_undef;
grep { defined $_ ? not $seen{ $k = $_ }++ : not $seen_undef++ } #_;
}
This has the same map technique although it's using grep instead. The grep goes through all of the elements in #_ and has to return either true or false for each of them. The elements which evaluate to true end up in the output list. The code then wants to make an element evaluate to true the first time it sees it and false the rest of the times.
In this code it handles undef separately. If the current element is not undef, it does the first branch of the conditional operator and the second branch otherwise. Now let's look at the branches.
The defined case adds an element to a hash. No one left code comments about the use of $k but it probably has something to do with not disturbing $_. That $k becomes the key for the hash:
not $seen{ $k = $_ }++
If that is the first time that key has been encountered the value of the hash is undef. That post-increment does its work after the value is used so hold off on thinking about that for a moment. The low-precendence not sees the value of $seen{$k}, which is undef. The not turns the false value of undef into true. That true indicates that the grep has seen $_ for the first time. It becomes part of the output list. Then the ++ does its work and increments the undef value to 1. On all subsequent encounters with the same value the hash value will be true. The not will turn the true value into false and that element won't be in the output list.
The map you show implements the grep. It returns an element when the condition is true and returns no elements when it is false:
map {$h{$_}++ == 0 ? $_ : () } #_;
For each element it adds it as the key in the hash and compares the value to 0. The first time an element is seen that value is undef. In numeric context an undef is 0. So, the == returns true and the first branch of the conditional operator fires, returning $_ to the output list. The ++ then increments the hash value from undef to 1. The next time it encounters the same value the hash value is not 0 and the second branch of the conditional operator returns the empty list. That adds no elements to the output list.
Newer version of List::MoreUtils don't use the construct any more, but as Сухой27 explained,
map { CONDITION ? $_ : () } LIST
is just a fancy alternative to
grep { CONDITION } LIST
I don't think there's any overarching reason the author chose map for this implementation, and in fact it was simplified to grep in later versions of List::MoreUtils.
The firstidx syntax is firstidx BLOCK LIST. Like the builtin map and grep, it is specified that the code in BLOCK will operate on the variable $_, and that the code is allowed to make changes to $_. So in the firstidx implementation, it is not sufficient to set $_ to each value in LIST. Rather, $_ must be aliased to each element of LIST so that a change in $_ inside BLOCK also results to a change in the element in the LIST. This is accomplished by manipulating the symbol table
local *_ = \$scalar # make $_ an alias of $scalar
And you use local so that when firstidx is done, we haven't clobbered any useful information that was previously in the $_ variable.
I have the following code snippet in Perl:
my $argsize = #args;
if ($argsize >1){
foreach my $a ($args[1..$argsize-1]) {
$a =~ s/(.*[-+*].*)/\($1\)/; # if there's a math operator, put in parens
}
}
On execution I'm getting "Use of unitialized value $. in range (or flip) , followed by Argument "" isn't numeric in array element at... both pointing to the foreach line.
Can someone help me decipher the error message (and fix the problem(s))? I have an array #args of strings. The code should loop through the second to n't elements (if any exist), and surround individual args with () if they contain a +,-, or *.
I don't think the error stems from the values in args, I think I'm screwing up the range somehow... but I'm failing when args has > 1 element. an example might be:
<"bla bla bla"> <x-1> <foo>
The long and short of it is - your foreach line is broken:
foreach my $a (#args[1..$argsize-1]) {
Works fine. It's because you're using a $ which says 'scalar value' rather than an # which says array (or list).
If you use diagnostics you get;
Use of uninitialized value $. in range (or flip) at
(W uninitialized) An undefined value was used as if it were already
defined. It was interpreted as a "" or a 0, but maybe it was a mistake.
To suppress this warning assign a defined value to your variables.
To help you figure out what was undefined, perl will try to tell you
the name of the variable (if any) that was undefined. In some cases
it cannot do this, so it also tells you what operation you used the
undefined value in. Note, however, that perl optimizes your program
and the operation displayed in the warning may not necessarily appear
literally in your program. For example, "that $foo" is usually
optimized into "that " . $foo, and the warning will refer to the
concatenation (.) operator, even though there is no . in
your program.
You can reproduce this error by:
my $x = 1..3;
Which is actually pretty much what you're doing here - you're trying to assign an array value into a scalar.
There's a load more detail in this question:
What is the Perl context with range operator?
But basically: It's treating it as a range operator, as if you were working your way through a file. You would be able to 'act on' particular lines in the file via this operator.
e.g.:
use Data::Dumper;
while (<DATA>) {
my $x = 2 .. 3;
print Dumper $x;
print if $x;
}
__DATA__
line one
another line
third line
fourth line
That range operator is testing line numbers - and because you have no line numbers (because you're not iterating a file) it errors. (But otherwise - it might work, but you'd get some really strange results ;))
But I'd suggest you're doing this quite a convoluted way, and making (potentially?) an error, in that you're starting your array at 1, not zero.
You could instead:
s/(.*[-+*].*)/\($1\)/ for #args;
Which'll have the same result.
(If you need to skip the first argument:
my ( $first_arg, #rest ) = #args;
s/(.*[-+*].*)/\($1\)/ for #rest;
But that error at runtime is the result of some of the data you're feeding in. What you've got here though:
use strict;
use warnings;
my #args = ( '<"bla bla bla">', '<x-1>', '<foo>' );
print "Before #args\n";
s/(.*[-+*].*)/\($1\)/ for #args;
print "After: #args\n";
This question already has answers here:
Why do you need $ when accessing array and hash elements in Perl?
(9 answers)
Closed 8 years ago.
Today I start my perl journey, and now I'm exploring the data type.
My code looks like:
#list=(1,2,3,4,5);
%dict=(1,2,3,4,5);
print "$list[0]\n"; # using [ ] to wrap index
print "$dict{1}\n"; # using { } to wrap key
print "#list[2]\n";
print "%dict{2}\n";
it seems $ + var_name works for both array and hash, but # + var_name can be used to call an array, meanwhile % + var_name can't be used to call a hash.
Why?
#list[2] works because it is a slice of a list.
In Perl 5, a sigil indicates--in a non-technical sense--the context of your expression. Except from some of the non-standard behavior that slices have in a scalar context, the basic thought is that the sigil represents what you want to get out of the expression.
If you want a scalar out of a hash, it's $hash{key}.
If you want a scalar out of an array, it's $array[0]. However, Perl allows you to get slices of the aggregates. And that allows you to retrieve more than one value in a compact expression. Slices take a list of indexes. So,
#list = #hash{ qw<key1 key2> };
gives you a list of items from the hash. And,
#list2 = #list[0..3];
gives you the first four items from the array. --> For your case, #list[2] still has a "list" of indexes, it's just that list is the special case of a "list of one".
As scalar and list contexts were rather well defined, and there was no "hash context", it stayed pretty stable at $ for scalar and # for "lists" and until recently, Perl did not support addressing any variable with %. So neither %hash{#keys} nor %hash{key} had meaning. Now, however, you can dump out pairs of indexes with values by putting the % sigil on the front.
my %hash = qw<a 1 b 2>;
my #list = %hash{ qw<a b> }; # yields ( 'a', 1, 'b', 2 )
my #l2 = %list[0..2]; # yields ( 0, 'a', 1, '1', 2, 'b' )
So, I guess, if you have an older version of Perl, you can't, but if you have 5.20, you can.
But for a completist's sake, slices have a non-intuitive way that they work in a scalar context. Because the standard behavior of putting a list into a scalar context is to count the list, if a slice worked with that behavior:
( $item = #hash{ #keys } ) == scalar #keys;
Which would make the expression:
$item = #hash{ #keys };
no more valuable than:
scalar #keys;
So, Perl seems to treat it like the expression:
$s = ( $hash{$keys[0]}, $hash{$keys[1]}, ... , $hash{$keys[$#keys]} );
And when a comma-delimited list is evaluated in a scalar context, it assigns the last expression. So it really ends up that
$item = #hash{ #keys };
is no more valuable than:
$item = $hash{ $keys[-1] };
But it makes writing something like this:
$item = $hash{ source1(), source2(), #array3, $banana, ( map { "$_" } source4()};
slightly easier than writing:
$item = $hash{ [source1(), source2(), #array3, $banana, ( map { "$_" } source4()]->[-1] }
But only slightly.
Arrays are interpolated within double quotes, so you see the actual contents of the array printed.
On the other hand, %dict{1} works, but is not interpolated within double quotes. So, something like my %partial_dict = %dict{1,3} is valid and does what you expect i.e. %partial_dict will now have the value (1,2,3,4). But "%dict{1,3}" (in quotes) will still be printed as %dict{1,3}.
Perl Cookbook has some tips on printing hashes.
I am currently documenting all of Perl 5's operators (see the perlopref GitHub project) and I have decided to include Perl 5's pseudo-operators as well. To me, a pseudo-operator in Perl is anything that looks like an operator, but is really more than one operator or a some other piece of syntax. I have documented the four I am familiar with already:
()= the countof operator
=()= the goatse/countof operator
~~ the scalar context operator
}{ the Eskimo-kiss operator
What other names exist for these pseudo-operators, and do you know of any pseudo-operators I have missed?
=head1 Pseudo-operators
There are idioms in Perl 5 that appear to be operators, but are really a
combination of several operators or pieces of syntax. These pseudo-operators
have the precedence of the constituent parts.
=head2 ()= X
=head3 Description
This pseudo-operator is the list assignment operator (aka the countof
operator). It is made up of two items C<()>, and C<=>. In scalar context
it returns the number of items in the list X. In list context it returns an
empty list. It is useful when you have something that returns a list and
you want to know the number of items in that list and don't care about the
list's contents. It is needed because the comma operator returns the last
item in the sequence rather than the number of items in the sequence when it
is placed in scalar context.
It works because the assignment operator returns the number of items
available to be assigned when its left hand side has list context. In the
following example there are five values in the list being assigned to the
list C<($x, $y, $z)>, so C<$count> is assigned C<5>.
my $count = my ($x, $y, $z) = qw/a b c d e/;
The empty list (the C<()> part of the pseudo-operator) triggers this
behavior.
=head3 Example
sub f { return qw/a b c d e/ }
my $count = ()= f(); #$count is now 5
my $string = "cat cat dog cat";
my $cats = ()= $string =~ /cat/g; #$cats is now 3
print scalar( ()= f() ), "\n"; #prints "5\n"
=head3 See also
L</X = Y> and L</X =()= Y>
=head2 X =()= Y
This pseudo-operator is often called the goatse operator for reasons better
left unexamined; it is also called the list assignment or countof operator.
It is made up of three items C<=>, C<()>, and C<=>. When X is a scalar
variable, the number of items in the list Y is returned. If X is an array
or a hash it it returns an empty list. It is useful when you have something
that returns a list and you want to know the number of items in that list
and don't care about the list's contents. It is needed because the comma
operator returns the last item in the sequence rather than the number of
items in the sequence when it is placed in scalar context.
It works because the assignment operator returns the number of items
available to be assigned when its left hand side has list context. In the
following example there are five values in the list being assigned to the
list C<($x, $y, $z)>, so C<$count> is assigned C<5>.
my $count = my ($x, $y, $z) = qw/a b c d e/;
The empty list (the C<()> part of the pseudo-operator) triggers this
behavior.
=head3 Example
sub f { return qw/a b c d e/ }
my $count =()= f(); #$count is now 5
my $string = "cat cat dog cat";
my $cats =()= $string =~ /cat/g; #$cats is now 3
=head3 See also
L</=> and L</()=>
=head2 ~~X
=head3 Description
This pseudo-operator is named the scalar context operator. It is made up of
two bitwise negation operators. It provides scalar context to the
expression X. It works because the first bitwise negation operator provides
scalar context to X and performs a bitwise negation of the result; since the
result of two bitwise negations is the original item, the value of the
original expression is preserved.
With the addition of the Smart match operator, this pseudo-operator is even
more confusing. The C<scalar> function is much easier to understand and you
are encouraged to use it instead.
=head3 Example
my #a = qw/a b c d/;
print ~~#a, "\n"; #prints 4
=head3 See also
L</~X>, L</X ~~ Y>, and L<perlfunc/scalar>
=head2 X }{ Y
=head3 Description
This pseudo-operator is called the Eskimo-kiss operator because it looks
like two faces touching noses. It is made up of an closing brace and an
opening brace. It is used when using C<perl> as a command-line program with
the C<-n> or C<-p> options. It has the effect of running X inside of the
loop created by C<-n> or C<-p> and running Y at the end of the program. It
works because the closing brace closes the loop created by C<-n> or C<-p>
and the opening brace creates a new bare block that is closed by the loop's
original ending. You can see this behavior by using the L<B::Deparse>
module. Here is the command C<perl -ne 'print $_;'> deparsed:
LINE: while (defined($_ = <ARGV>)) {
print $_;
}
Notice how the original code was wrapped with the C<while> loop. Here is
the deparsing of C<perl -ne '$count++ if /foo/; }{ print "$count\n"'>:
LINE: while (defined($_ = <ARGV>)) {
++$count if /foo/;
}
{
print "$count\n";
}
Notice how the C<while> loop is closed by the closing brace we added and the
opening brace starts a new bare block that is closed by the closing brace
that was originally intended to close the C<while> loop.
=head3 Example
# count unique lines in the file FOO
perl -nle '$seen{$_}++ }{ print "$_ => $seen{$_}" for keys %seen' FOO
# sum all of the lines until the user types control-d
perl -nle '$sum += $_ }{ print $sum'
=head3 See also
L<perlrun> and L<perlsyn>
=cut
Nice project, here are a few:
scalar x!! $value # conditional scalar include operator
(list) x!! $value # conditional list include operator
'string' x/pattern/ # conditional include if pattern
"#{[ list ]}" # interpolate list expression operator
"${\scalar}" # interpolate scalar expression operator
!! $scalar # scalar -> boolean operator
+0 # cast to numeric operator
.'' # cast to string operator
{ ($value or next)->depends_on_value() } # early bail out operator
# aka using next/last/redo with bare blocks to avoid duplicate variable lookups
# might be a stretch to call this an operator though...
sub{\#_}->( list ) # list capture "operator", like [ list ] but with aliases
In Perl these are generally referred to as "secret operators".
A partial list of "secret operators" can be had here. The best and most complete list is probably in possession of Philippe Bruhad aka BooK and his Secret Perl Operators talk but I don't know where its available. You might ask him. You can probably glean some more from Obfuscation, Golf and Secret Operators.
Don't forget the Flaming X-Wing =<>=~.
The Fun With Perl mailing list will prove useful for your research.
The "goes to" and "is approached by" operators:
$x = 10;
say $x while $x --> 4;
# prints 9 through 4
$x = 10;
say $x while 4 <-- $x;
# prints 9 through 5
They're not unique to Perl.
From this question, I discovered the %{{}} operator to cast a list as a hash. Useful in
contexts where a hash argument (and not a hash assignment) are required.
#list = (a,1,b,2);
print values #list; # arg 1 to values must be hash (not array dereference)
print values %{#list} # prints nothing
print values (%temp=#list) # arg 1 to values must be hash (not list assignment)
print values %{{#list}} # success: prints 12
If #list does not contain any duplicate keys (odd-elements), this operator also provides a way to access the odd or even elements of a list:
#even_elements = keys %{{#list}} # #list[0,2,4,...]
#odd_elements = values %{{#list}} # #list[1,3,5,...]
The Perl secret operators now have some reference (almost official, but they are "secret") documentation on CPAN: perlsecret
You have two "countof" (pseudo-)operators, and I don't really see the difference between them.
From the examples of "the countof operator":
my $count = ()= f(); #$count is now 5
my $string = "cat cat dog cat";
my $cats = ()= $string =~ /cat/g; #$cats is now 3
From the examples of "the goatse/countof operator":
my $count =()= f(); #$count is now 5
my $string = "cat cat dog cat";
my $cats =()= $string =~ /cat/g; #$cats is now 3
Both sets of examples are identical, modulo whitespace. What is your reasoning for considering them to be two distinct pseudo-operators?
How about the "Boolean one-or-zero" operator: 1&!!
For example:
my %result_of = (
" 1&!! '0 but true' " => 1&!! '0 but true',
" 1&!! '0' " => 1&!! '0',
" 1&!! 'text' " => 1&!! 'text',
" 1&!! 0 " => 1&!! 0,
" 1&!! 1 " => 1&!! 1,
" 1&!! undef " => 1&!! undef,
);
for my $expression ( sort keys %result_of){
print "$expression = " . $result_of{$expression} . "\n";
}
gives the following output:
1&!! '0 but true' = 1
1&!! '0' = 0
1&!! 'text' = 1
1&!! 0 = 0
1&!! 1 = 1
1&!! undef = 0
The << >> operator, for multi-line comments:
<<q==q>>;
This is a
multiline
comment
q