Why are leading-hyphen options permitted on `use` lines without fat comma and with strict? - perl

Why is the following use line legal Perl syntax? (Adapted from the POD for parent; tested on Perl 5.26.2 x64 on Cygwin.)
package MyHash;
use strict;
use Tie::Hash;
use parent -norequire, "Tie::StdHash";
# ^^^^^^^^^^ A bareword with nothing to protect it!
Under -MO=Deparse, the use line becomes
use parent ('-norequire', 'Tie::StdHash');
but I can't tell from the use docs where the quoting on -norequire comes from.
If use strict were not in force, I would understand it. The bareword norequire would become the string "norequire", the unary minus would turn that string into "-bareword", and the resulting string would go into the use import list. For example:
package MyHash;
use Tie::Hash;
use parent -norequire, "Tie::StdHash";
Similarly, if there were a fat comma, I would understand it. -foo => bar becomes "-foo", bar because => turns foo into "foo", and then the unary minus works its magic again. For example:
package MyHash;
use strict;
use Tie::Hash;
use parent -norequire => "Tie::StdHash";
Both of those examples produce the same deparse for the use line. However, both have quoting that the original example does not. What am I missing that makes the original example (with strict, without =>) legal? Thanks!

You already cited perldoc perlop, but it is relevant here.
Unary - performs arithmetic negation if the operand is numeric, including any string that looks like a number. If the operand is an identifier, a string consisting of a minus sign concatenated with the identifier is returned. ... One effect of these rules is that -bareword is equivalent to the string "-bareword".
This behavior of the unary minus operator is applied to the bareword before the strict checks are applied. Therefore, unary minus is a kind of quoting operator that also works in strict mode.
Similarly, barewords as the invocant in method invocation do not need to be quoted as long as they are not a function call:
Foo->bar; # 'Foo'->bar(); --- but only if no sub Foo exists
print->bar; # print($_)->bar();
However, the unary minus behaviour seems to be due to constant folding, not due to a special case in the parser. For example, this code
use strict;
0 ? foo : bar;
will only complain about the bareword "bar" being disallowed, suggesting that the bareword check happens very late during parsing and compilation. In the unary minus case, the bareword will already have been constant-folded into a proper string value at that point, and no bareword remains visible.
While this is arguably buggy, it is also impossible to change without breaking backwards compatibility – and this behaviour is used by many modules such as use parent to communicate options. Compare also similar idioms on command line interfaces, where options usually begin with a dash.

From perlop
Symbolic Unary Operators
Unary "-" performs arithmetic negation if the operand is numeric, including any
string that looks like a number. If the operand is an identifier, a string
consisting of a minus sign concatenated with the identifier is returned.
Otherwise, if the string starts with a plus or minus, a string starting with
the opposite sign is returned. One effect of these rules is that -bareword is
equivalent to the string "-bareword". If, however, the string begins with a
non-alphabetic character (excluding "+" or "-"), Perl will attempt to convert
the string to a numeric and the arithmetic negation is performed. If the string
cannot be cleanly converted to a numeric, Perl will give the warning Argument
"the string" isn't numeric in negation (-) at ....
So because of the rules of Perl parsing -name is treated as "-name" even under use strict

Related

Not able to understand a command in perl

I need help to understand what below command is doing exactly
$abc{hier} =~ s#/tools.*/dfII/?.*##g;
and $abc{hier} contains a path "/home/test1/test2/test3"
Can someone please let me know what the above command is doing exactly. Thanks
s/PATTERN/REPLACEMENT/ is Perl's substitution operator. It searches a string for text that matches the regex PATTERN and replaces it with REPLACEMENT.
By default, the substitution operator works on $_. To tell it to work on a different variable, you use the binding operator - =~.
The default delimiter used by the substitution operator is a slash (/) but you can change that to any other character. This is useful if your PATTERN or your REPLACEMENT contains a slash. In this case, the programmer has used # as the delimiter.
To recap:
$abc{hier} =~ s#PATTERN#REPLACEMENT#;
means "look for text in $abc{hier} that matches PATTERN and replace it with REPLACEMENT.
The substitution operator also has various options that change its behaviour. They are added by putting letters after the final delimiter. In this case we have a g. That means "make the substitution global" - or match and change all occurrences of PATTERN.
In your case, the REPLACEMENT string is empty (we have two # characters next to each other). So we're replacing the PATTERN with nothing - effectively deleting whatever matches PATTERN.
So now we have:
$abc{hier} =~ s#PATTERN*##g;
And we know it means, "in the variable $abc{hier}, look for any string that matches PATTERN and replace it with nothing".
The last thing to look at is the PATTERN (or regular expression - "regex"). You can get the full definition of regexes in perldoc perlre. But to explain what we're using here:
/tools : is the fixed string "/tools"
.* : is zero or more of any character
/dfII : is the fixed string "/dfII"
/? : is an optional slash character
.* : is (again) zero or more of any character
So, basically, we're removing bits of a file path from a value that's stored in a hash.
This =~ means "Do a regex operation on that variable."
(Actually, as ikegami correctly reminds me, it is not necessarily only regex operations, because it could also be a transliteration.)
The operation in question is s#something#else#, which means replace the "something" with something "else".
The g at the end means "Do it for all occurences of something."
Since the "else" is empty, the replacement has the effect of deleting.
The "something" is a definition according to regex syntax, roughly it means "Starting with '/tools' and later containing '/dfII', followed pretty much by anything until the end."
Note, the regex mentions at the end /?.*. In detail, this would mean "A slash (/) , or maybe not (?), and then absolutely anything (.) any number of times including 0 times (*). Strictly speaking it is not necessary to define "slash or not", if it is followed by "anything any often", because "anything" includes as slash, and anyoften would include 0 or one time; whether it is followed by more "anything" or not. I.e. the /? could be omitted, without changing the behaviour.
(Thanks ikeagami for confirming.)
$abc{hier} =~ s#/tools.*/dfII/?.*##g;
The above commands use regular expression to strip/remove trailing /tools.*/dfII and
/tools.*/dfII/.* from value of hier member of %abc hash.
It is pretty basic perl except non standard regular expression limiters (# instead of standard /). It allows to avoid escaping / inside the regular expression (s/\/tools.*\/dfII\/?.*//g).
My personal preferred style-guide would make it s{/tools.*/dfII/?.*}{}g .

Perl :what does "-" means in perl

what does '-' mean in the param
$cgi->start_html(-title => uc($color), -BGCOLOR => $color);
I just know it is used in hash type, but this is param in a sub. So it makes me confused, and i searched for a long time.
Whenever you come across confusing syntax in Perl, a handy tool is the -MO=Deparse option. This causes Perl to check the syntax of a script and output the script in a normalized form, rather than executing it.
So if I do
perl -MO=Deparse -e '$cgi->start_html(-title => uc($color), -BGCOLOR => $color);'
I get a result of:
$cgi->start_html(-'title', uc $color, -'BGCOLOR', $color);
-e syntax OK
There are three differences here:
Quotes were added to title and BGCOLOR.
The => operators changed to commas.
The parentheses disappeared from uc($color).
The first two are the normal effects of the => ("fat comma") operator: It's equivalent to a comma, except that if the thing to the left is an identifier (starting with a letter or underscore and containing only alphanumeric characters and underscores), that identifier becomes a quoted string.
And the parentheses after uc just aren't strictly necessary in this situation, since the builtin function uc is prototyped to take 0 or 1 arguments.
But now we have -'title' and -'BGCOLOR', so what's the negative of a string? Checking perldoc perlop, we see that unary minus follows the rules:
If the operand is a number or a string representation of a number, does an arithmetic negation.
Otherwise, if the string starts with '+' or '-', switches just the first character of the string to the opposite sign.
Otherwise, if the string starts with a letter, adds a '-' to the beginning of the string.
Otherwise, attempts to convert the string to a number, probably prints a warning if warnings are enabled, and then does an arithmetic negation.
Here we have case 3, so -'title' is '-title' and -'BGCOLOR' is '-BGCOLOR'.
So presumably the start_html method expects a list of arguments which come in key-value pairs, and the key strings are supposed to start with hyphens. (It might or might not internally use these arguments to create a hash, with a line like my %options = #_;.)
This is all a little roundabout, plus you'd get confusing results if you ever tried passing something like -3zzz => $value. So I'd personally add explicit quotes here to make it obvious what's being passed, but keep using the fat commas anyway to emphasize the arguments are meant to be key/value pairs:
$cgi->start_html('-title' => uc($color), '-BGCOLOR' => $color);
It has no effect here. It's just treated as part of the string. I assume that the original author of CGI.pm wanted to make the options look more like command-line options. I think that was a terrible idea.
It's a string literal, just like "-title" or "-BGCOLOR".
perldoc perlop:
[Unary "-" ...] If the operand is an identifier, a string consisting of a minus sign concatenated with the identifier is returned. Otherwise, if the string starts with a plus or minus, a string starting with the opposite sign is returned.
In other words, -"foo" is "-foo".
The => operator (sometimes pronounced "fat comma") is a synonym for the comma except that it causes a word on its left to be interpreted as a string if it begins with a letter or underscore and is composed only of letters, digits and underscores.
In other words, foo => 42 is "foo", 42.
Taken together, this means -title => uc($color) is "-title", uc($color).

Difference between /.../ and m/.../ in Perl

What is difference between /.../ and m/.../?
use strict;
use warnings;
my $str = "This is a testing for modifier";
if ($str =~ /This/i) { print "Modifier...\n"; }
if ($str =~ m/This/i) { print "W/O Modifier...\n"; }
However, I checked with this site for Reference not clearly understand with the theory
There's no difference. If you just supply /PATTERN/ then it assumes m. However, if you're using an alternative delimiter, you need to supply the m. E.g. m|PATTERN| won't work as |PATTERN|.
In your example, i is the modifier as it's after the pattern. m is the operation. (as opposed to s, tr, y etc.)
Perhaps slightly confusingly - you can use m as a modifier, but only if you put if after the match.
m/PATTERN/m will cause ^ and $ to match differently than in m/PATTERN/, but it's the trailing m that does this, not the leading one.
Perl has a number of quote-like operators where you can choose the delimiter to suit the data you're passing to the operator.
q(...) creates a single-quoted string
qq(...) creates a double-quoted string
qw(...) creates a list by splitting its arguments on white-space
qx(...) executes a command and returns the output
qr(...) compiles a regular expression
m(...) matches its argument as a regular expression
(There's also s(...)(...) but I've left that off the list as it has two arguments)
For some of these, you can omit the letter at the start of the operator if you choose the default delimiter.
You can omit q if you use single quote characters ('...').
You can omit qq if you use double quote characters ("...").
You can omit qx if you use backticks (`...`).
You can omit m if you use slashes (/.../).
So, to answer your original question, m/.../ and /.../ are the same, but because slashes are the default delimitor for the match operator, you can omit the m.

Why aren't my nested lookarounds working correctly in my Perl substitution?

I have a Perl substitution which converts hyperlinks to lowercase:
's/(?<=<a href=")([^"]+)(?=")/\L$1/g'
I want the substitution to ignore any links which begin with a hash, for example I want it to change the path in Foo Bar to lowercase but skip if it comes across Bar.
Nesting lookaheads to instruct it to skip these links isn't working correctly for me. This is the one-liner I've written:
perl -pi -e 's/(?<=<a href=" (?! (?<=<a href="#) ) )([^"]+)(?=")/\L$1/g' *;
Could anyone hint to me where I have gone wrong with this substitution? It executes just fine, but does not do anything.
As near as I can tell, your initial regex will work just fine, if you add the condition that the first character in the link may not be a hash # or a double quote, e.g. [^#"]
s/(?<=<a href=")([^#"][^"]+)(?=")/\L$1/gi;
In the case you have links which do not start with a hash, e.g. Foo Bar, it becomes slightly more complicated:
s{(?<=<a href=")([^#"]+)(#[^"]+)*(?=")}{ lc($1) . ($2 // "") }gei;
We now have to evaluate the substitution, since otherwise we get undefined variable warnings when the optional anchor reference is not present.
You don't need look-arounds, from what I see
use 5.010;
...
s/<a \s+ href \s* = \s* "\K([^#"][^"]*)"/\L$1"/gx;
\K means "keep" everything before it. It amounts to a variable-length look-behind.
perlre:
For various reasons \K may be significantly more efficient than the equivalent (?<=...) construct, and it is especially useful in situations where you want to efficiently remove something following something else in a string.

What's the difference between 'eq' and '=~' in Perl?

What is the difference between these two operators? Specifically, what difference in $a will lead to different behavior between the two?
$a =~ /^pattern$/
$a eq 'pattern'
eq is for testing string equality, == is the same thing but for numerical equality.
The =~ operator is for applying a regular expression to a scalar.
For the gory details of every Perl operator and what they're for, see the perldoc perlop manpage.
As others have noted, ($a =~ /^pattern$/) uses the regular expression engine to evaluate whether the strings are identical, whereas ($a eq 'pattern') is the plain string equality test.
If you really only want to know whether two strings are identical, the latter is preferred for reasons of:
Readability - It is more concise, containing fewer special characters.
Maintainability - With a regex pattern, you must escape any special characters that may appear in your string, or use extra markers such as \Q and \E. With a single-quoted string, the only character you need to escape is a single quote. (You also have to escape backslashes if they are followed by another backslash or the string delimiter.)
Performance - You don't incur the overhead of firing up the regex engine just to compare a string. If this happens several million times in your program, for example, the benefit is notable.
On the other hand, the regex form is far more flexible if you need to do something other than a plain string equality test. See perldoc perlre for more on regular expressions.
EDIT: Like most everyone else before ysth, I missed the obvious functional difference between them and went straight for more abstract differences. I've clarified the question but I'll leave the answer as a (hopefully) useful reference.
eq -- Tests for string equality.
=~ -- Binds a scalar expression to a pattern match.
See here for more in-depth descriptions of all of the operators.
"pattern\n" :)
$a = "pattern\n";
print "ok 1\n" if $a =~ /^pattern$/;
print "ok 2\n" if $a eq 'pattern';
Perhaps you meant /^pattern\z/.
=~ is the binding operator. It is used to bind a value to either a pattern match (m//), a substitution (s///), or a transliteration (tr// or y//).
eq is the string equality operator; it compares two values to determine whether or not they're equal when considered as strings. There is a peer == operator that does the same thing only considering the values as numbers. (In Perl, strings and numbers are mostly interchangeable with conversions happening automatically depending on how the values are used. Because of this, when you want to compare two values you must specify the type of comparison to perform.)
In general, $var =~ m/.../ determines whether or not the value of $var matches a pattern, not whether it equals a particular value. However, in this case the pattern is anchored at both ends and contains nothing but literal characters, so it's equivalent to a string comparison. It's better to use eq here because it's clearer and faster.