When is Perl's scalar comma operator useful? - perl

Is there any reason to use a scalar comma operator anywhere other than in a for loop?

Since the Perl scalar comma is a "port" of the C comma operator, these comments are probably apropos:
Once in a while, you find yourself in
a situation in which C expects a
single expression, but you have two
things you want to say. The most
common (and in fact the only common)
example is in a for loop, specifically
the first and third controlling
expressions. What if (for example) you
want to have a loop in which i counts
up from 0 to 10 at the same time that
j is counting down from 10 to 0?
So, your instinct that it's mainly useful in for loops is a good one, I think.

I occasionally use it in the conditional (sometimes erroneously called "the ternary") operator, if the code is easier to read than breaking it out into a real if/else:
my $blah = condition() ? do_this(), do_that() : do_the_other_thing();
It could also be used in some expression where the last result is important, such as in a grep expression, but in this case it's just the same as if a semicolon was used:
my #results = grep { setup(), condition() } #list;

Related

Why are ##, #!, #, etc. not interpolated in strings?

First, please note that I ask this question out of curiosity, and I'm aware that using variable names like ## is probably not a good idea.
When using doubles quotes (or qq operator), scalars and arrays are interpolated :
$v = 5;
say "$v"; # prints: 5
$# = 6;
say "$#"; # prints: 6
#a = (1,2);
say "#a"; # prints: 1 2
Yet, with array names of the form #+special char like ##, #!, #,, #%, #; etc, the array isn't interpolated :
#; = (1,2);
say "#;"; # prints nothing
say #; ; # prints: 1 2
So here is my question : does anyone knows why such arrays aren't interpolated? Is it documented anywhere?
I couldn't find any information or documentation about that. There are too many articles/posts on google (or SO) about the basics of interpolation, so maybe the answer was just hidden in one of them, or at the 10th page of results..
If you wonder why I could need variable names like those :
The -n (and -p for that matter) flag adds a semicolon ; at the end of the code (I'm not sure it works on every version of perl though). So I can make this program perl -nE 'push#a,1;say"#a"}{say#a' shorter by doing instead perl -nE 'push#;,1;say"#;"}{say#', because that last ; convert say# to say#;. Well, actually I can't do that because #; isn't interpolated in double quotes. It won't be useful every day of course, but in some golfing challenges, why not!
It can be useful to obfuscate some code. (whether obfuscation is useful or not is another debate!)
Unfortunately I can't tell you why, but this restriction comes from code in toke.c that goes back to perl 5.000 (1994!). My best guess is that it's because Perl doesn't use any built-in array punctuation variables (except for #- and #+, added in 5.6 (2000)).
The code in S_scan_const only interprets # as the start of an array if the following character is
a word character (e.g. #x, #_, #1), or
a : (e.g. #::foo), or
a ' (e.g. #'foo (this is the old syntax for ::)), or
a { (e.g. #{foo}), or
a $ (e.g. #$foo), or
a + or - (the arrays #+ and #-), but not in regexes.
As you can see, the only punctuation arrays that are supported are #- and #+, and even then not inside a regex. Initially no punctuation arrays were supported; #- and #+ were special-cased in 2000. (The exception in regex patterns was added to make /[\c#-\c_]/ work; it used to interpolate #- first.)
There is a workaround: Because #{ is treated as the start of an array variable, the syntax "#{;}" works (but that doesn't help your golf code because it makes the code longer).
Perl's documentation says that the result is "not strictly predictable".
The following, from perldoc perlop (Perl 5.22.1), refers to interpolation of scalars. I presume it applies equally to arrays.
Note also that the interpolation code needs to make a decision on
where the interpolated scalar ends. For instance, whether
"a $x -> {c}" really means:
"a " . $x . " -> {c}";
or:
"a " . $x -> {c};
Most of the time, the longest possible text that does not include
spaces between components and which contains matching braces or
brackets. because the outcome may be determined by voting based on
heuristic estimators, the result is not strictly predictable.
Fortunately, it's usually correct for ambiguous cases.
Some things are just because "Larry coded it that way". Or as I used to say in class, "It works the way you think, provided you think like Larry thinks", sometimes adding "and it's my job to teach you how Larry thinks."

Regular expression repeitition: how to match expressions of variable lengths?

Essentially, here's what I want to do:
if ($expression =~ /^\d{num}\w{num}$/)
{
#doSomething
}
where num is not an identifier, but could stand for any integer greater than 0 (\d and \w were arbitrarily chosen). I want to match a string iff it contains two groups of related characters, one group immediately followed by the other, and the number of characters in each group is the same.
For this example, 123abc and 021202abcdef would match, but 43abc would not, neither would 12ab3c or 1234acbcde.
Don’t think of the string as growing from left to right, but rather from the outside in:
xy
x(xy)y
xx(xy)yy
Your regex would then be something like:
/^(x(?1)?y)$/
Where (?1) is a reference to the outer pair of parentheses. ? makes it optional in order to give a “base case” of sorts to the recursive match. This is probably the simplest example of how regexes can be used to match context-free grammars—though it’s generally easier to get right with a parser generator or parser combinator library.
Well, there's
if ($expression =~ /^(\d+)([[:alpha:]]+)$/ && length($1)==length($2))
{
#doSomething
}
A regex isn't always the best option.

How does this Perl one-liner actually work?

So, I happened to notice that last.fm is hiring in my area, and since I've known a few people who worked there, I though of applying.
But I thought I'd better take a look at the current staff first.
Everyone on that page has a cute/clever/dumb strapline, like "Is life not a thousand times too short for us to bore ourselves?". In fact, it was quite amusing, until I got to this:
perl -e'print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34'
Which I couldn't resist pasting into my terminal (kind of a stupid thing to do, maybe), but it printed:
Just another Last.fm hacker,
I thought it would be relatively easy to figure out how that Perl one-liner works. But I couldn't really make sense of the documentation, and I don't know Perl, so I wasn't even sure I was reading the relevant documentation.
So I tried modifying the numbers, which got me nowhere. So I decided it was genuinely interesting and worth figuring out.
So, 'how does it work' being a bit vague, my question is mainly,
What are those numbers? Why are there negative numbers and positive numbers, and does the negativity or positivity matter?
What does the combination of operators +=$_ do?
What's pack+q,c*,, doing?
This is a variant on “Just another Perl hacker”, a Perl meme. As JAPHs go, this one is relatively tame.
The first thing you need to do is figure out how to parse the perl program. It lacks parentheses around function calls and uses the + and quote-like operators in interesting ways. The original program is this:
print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34
pack is a function, whereas print and map are list operators. Either way, a function or non-nullary operator name immediately followed by a plus sign can't be using + as a binary operator, so both + signs at the beginning are unary operators. This oddity is described in the manual.
If we add parentheses, use the block syntax for map, and add a bit of whitespace, we get:
print(+pack(+q,c*,,
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
The next tricky bit is that q here is the q quote-like operator. It's more commonly written with single quotes:
print(+pack(+'c*',
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
Remember that the unary plus is a no-op (apart from forcing a scalar context), so things should now be looking more familiar. This is a call to the pack function, with a format of c*, meaning “any number of characters, specified by their number in the current character set”. An alternate way to write this is
print(join("", map {chr($.+=$_)} (74, …, -34)))
The map function applies the supplied block to the elements of the argument list in order. For each element, $_ is set to the element value, and the result of the map call is the list of values returned by executing the block on the successive elements. A longer way to write this program would be
#list_accumulator = ();
for $n in (74, …, -34) {
$. += $n;
push #list_accumulator, chr($.)
}
print(join("", #list_accumulator))
The $. variable contains a running total of the numbers. The numbers are chosen so that the running total is the ASCII codes of the characters the author wants to print: 74=J, 74+43=117=u, 74+43-2=115=s, etc. They are negative or positive depending on whether each character is before or after the previous one in ASCII order.
For your next task, explain this JAPH (produced by EyesDrop).
''=~('(?{'.('-)#.)#_*([]#!#/)(#)#-#),#(##+#)'
^'][)#]`}`]()`#.#]#%[`}%[#`#!##%[').',"})')
Don't use any of this in production code.
The basic idea behind this is quite simple. You have an array containing the ASCII values of the characters. To make things a little bit more complicated you don't use absolute values, but relative ones except for the first one. So the idea is to add the specific value to the previous one, for example:
74 -> J
74 + 43 -> u
74 + 42 + (-2 ) -> s
Even though $. is a special variable in Perl it does not mean anything special in this case. It is just used to save the previous value and add the current element:
map($.+=$_, ARRAY)
Basically it means add the current list element ($_) to the variable $.. This will return a new array with the correct ASCII values for the new sentence.
The q function in Perl is used for single quoted, literal strings. E.g. you can use something like
q/Literal $1 String/
q!Another literal String!
q,Third literal string,
This means that pack+q,c*,, is basically pack 'c*', ARRAY. The c* modifier in pack interprets the value as characters. For example, it will use the value and interpret it as a character.
It basically boils down to this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_value = 0;
my #relative = (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34);
my #absolute = map($prev_value += $_, #relative);
print pack("c*", #absolute);

What should the ... operators be called?

The ... operators are identical to the range operator (..) in list context and nearly identical to the flip-flop operator (..) in scalar context, but calling them the range operator and the flip-flop operator seems wrong since those names are more commonly associated with .., which has slightly different behavior (in scalar context at least).
For now, I am calling them the alternate range/flip-flop operator.
Since ... is identical to .. in list context I'd call it the same thing: the range operator. Giving it another name would imply that it does something different. If I needed to distinguish it from .. for some reason I'd probably call it the "three-dot syntax for the range operator."
If I wanted to mess with people I'd tell them that it's "for really long ranges." ;)
In scalar context I've generally called ... the "sed-like flip-flop operator" because of the reference to sed behavior in the documentation, but I don't like that for a name. How about the "long flip-flop" operator? The mnemonic is that ... is one dot longer and takes one more cycle to evaluate the right operand.
I like to think of it as the ellipses operator--which makes it clear that it's about multiple dots (".." or "...") and less confusion about it's function.
In 5.11, where a term is expected (due to a bug, currently only at the beginning of a statement), ... is the yada yada operator.
Otherwise, in list context, ... is the range operator (though I would regard it as code smell, since the code seems to be wanting something different than .. but isn't in fact any different).
Otherwise, it is the flip-flop operator, one flavor thereof. If I had to give it an adjective, I would say the sed-like flip-flop operator. In the perl6 spec it (well, the fff replacement, anyway) is called "flipflop (sed style)". If I wanted to give it a name not based on another language, I'd start by getting the perl6 spec updated, then update the perl5 doc.

Why do the '<' and 'lt' operators return different results in Perl?

I am just learning Perl's comparison operators. I tried the below code :-
$foo=291;
$bar=30;
if ($foo < $bar) {
print "$foo is less than $bar (first)\n";
}
if ($foo lt $bar) {
print "$foo is less than $bar (second)\n";
}
The output is 291 is less than 30 (second). Does this mean the lt operator always converts the variables to string and then compare? What is the rationale for Perl making lt operator behave differently from the < operator?
Thanks,
Your guess is right. The alphabetic operators like lt compare the variables as strings whereas the symbolic ones like < compare them as numbers. You can read the perlop man page for more details.
The rationale is that scalars in Perl are not typed, so without you telling it Perl would not know how to compare two variables. If it did guess then it would sometimes getting it wrong, which would lead to having to do things like ' ' + $a < ' ' + $b to force string comparsion which is probably worse than lt.
That said this is a horrible gotcha which probably catches out everyone new to Perl and still catches me out when coming back to Perl after some time using a less post-modern language.
Since Perl is loosely typed, and values can silently convert between strings and integers at any moment, Perl needs two different types of comparison operators to distinguish between integer comparison (<) and string comparison (lt). If you only had one operator, how would you tell the difference?
Rationale? It's a string operator. From "perldoc perlop":
Binary "lt" returns true if the left argument is stringwise less than the right argument.
If that's not what you want, don't use it.
lt compares values lexically (i.e. in ASCII/UNICODE or locale order) and < compares values numerically. Perl has both operators for the same reason "10" + 5 is 15 rather than a type error: it is weakly typed. You must always tell the computer something unambiguous. Languages that are strongly typed tend to use casting to resolve ambiguity, whereas, weakly typed languages tend to use lots of operators. The Python (a strongly typed language) equivalent to "10" + 5 is float("10") + 5.
Does this mean the 'lt' operator
always converts the variables to
string and then compare?
Yes, see perlop
What is the rationale for Perl making
'lt' operator behave differently from
'<' operator?
Because having a numeric comparison operator and a string comparison operator makes a lot more sense then having a mumble mumble operator and another, identical mumble mumble operator.