Related
First, please note that I ask this question out of curiosity, and I'm aware that using variable names like ## is probably not a good idea.
When using doubles quotes (or qq operator), scalars and arrays are interpolated :
$v = 5;
say "$v"; # prints: 5
$# = 6;
say "$#"; # prints: 6
#a = (1,2);
say "#a"; # prints: 1 2
Yet, with array names of the form #+special char like ##, #!, #,, #%, #; etc, the array isn't interpolated :
#; = (1,2);
say "#;"; # prints nothing
say #; ; # prints: 1 2
So here is my question : does anyone knows why such arrays aren't interpolated? Is it documented anywhere?
I couldn't find any information or documentation about that. There are too many articles/posts on google (or SO) about the basics of interpolation, so maybe the answer was just hidden in one of them, or at the 10th page of results..
If you wonder why I could need variable names like those :
The -n (and -p for that matter) flag adds a semicolon ; at the end of the code (I'm not sure it works on every version of perl though). So I can make this program perl -nE 'push#a,1;say"#a"}{say#a' shorter by doing instead perl -nE 'push#;,1;say"#;"}{say#', because that last ; convert say# to say#;. Well, actually I can't do that because #; isn't interpolated in double quotes. It won't be useful every day of course, but in some golfing challenges, why not!
It can be useful to obfuscate some code. (whether obfuscation is useful or not is another debate!)
Unfortunately I can't tell you why, but this restriction comes from code in toke.c that goes back to perl 5.000 (1994!). My best guess is that it's because Perl doesn't use any built-in array punctuation variables (except for #- and #+, added in 5.6 (2000)).
The code in S_scan_const only interprets # as the start of an array if the following character is
a word character (e.g. #x, #_, #1), or
a : (e.g. #::foo), or
a ' (e.g. #'foo (this is the old syntax for ::)), or
a { (e.g. #{foo}), or
a $ (e.g. #$foo), or
a + or - (the arrays #+ and #-), but not in regexes.
As you can see, the only punctuation arrays that are supported are #- and #+, and even then not inside a regex. Initially no punctuation arrays were supported; #- and #+ were special-cased in 2000. (The exception in regex patterns was added to make /[\c#-\c_]/ work; it used to interpolate #- first.)
There is a workaround: Because #{ is treated as the start of an array variable, the syntax "#{;}" works (but that doesn't help your golf code because it makes the code longer).
Perl's documentation says that the result is "not strictly predictable".
The following, from perldoc perlop (Perl 5.22.1), refers to interpolation of scalars. I presume it applies equally to arrays.
Note also that the interpolation code needs to make a decision on
where the interpolated scalar ends. For instance, whether
"a $x -> {c}" really means:
"a " . $x . " -> {c}";
or:
"a " . $x -> {c};
Most of the time, the longest possible text that does not include
spaces between components and which contains matching braces or
brackets. because the outcome may be determined by voting based on
heuristic estimators, the result is not strictly predictable.
Fortunately, it's usually correct for ambiguous cases.
Some things are just because "Larry coded it that way". Or as I used to say in class, "It works the way you think, provided you think like Larry thinks", sometimes adding "and it's my job to teach you how Larry thinks."
Basic syntax tutorials I followed do not make this clear:
Is there any practical/philosophical/context-dependent/tricky difference between accessing an array using the former or latter subscript notation?
$ perl -le 'my #a = qw(io tu egli); print $a[1], #a[1]'
The output seems to be the same in both cases.
$a[...] # array element
returns the one element identified by the index expression, and
#a[...] # array slice
returns all the elements identified by the index expression.
As such,
You should use $a[EXPR] when you mean to access a single element in order to convey this information to the reader. In fact, you can get a warning if you don't.
You should use #a[LIST] when you mean to access many elements or a variable number of elements.
But that's not the end of the story. You asked for practical and tricky (subtle?) differences, and there's one noone mentioned yet: The index expression for an array element is evaluated in scalar context, while the index expression for an array slice is evaluated in list context.
sub f { return #_; }
$a[ f(4,5,6) ] # Same as $a[3]
#a[ f(4,5,6) ] # Same as $a[4],$a[5],$a[6]
If you turn on warnings (which you always should) you would see this:
Scalar value #a[0] better written as $a[0]
when you use #a[1].
The # sigil means "give me a list of something." When used with an array subscript, it retrieves a slice of the array. For example, #foo[0..3] retrieves the first four items in the array #foo.
When you write #a[1], you're asking for a one-element slice from #a. That's perfectly OK, but it's much clearer to ask for a single value, $a[1], instead. So much so that Perl will warn you if you do it the first way.
The first yields a scalar variable while the second gives you an array slice .... Very different animals!!
So, I happened to notice that last.fm is hiring in my area, and since I've known a few people who worked there, I though of applying.
But I thought I'd better take a look at the current staff first.
Everyone on that page has a cute/clever/dumb strapline, like "Is life not a thousand times too short for us to bore ourselves?". In fact, it was quite amusing, until I got to this:
perl -e'print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34'
Which I couldn't resist pasting into my terminal (kind of a stupid thing to do, maybe), but it printed:
Just another Last.fm hacker,
I thought it would be relatively easy to figure out how that Perl one-liner works. But I couldn't really make sense of the documentation, and I don't know Perl, so I wasn't even sure I was reading the relevant documentation.
So I tried modifying the numbers, which got me nowhere. So I decided it was genuinely interesting and worth figuring out.
So, 'how does it work' being a bit vague, my question is mainly,
What are those numbers? Why are there negative numbers and positive numbers, and does the negativity or positivity matter?
What does the combination of operators +=$_ do?
What's pack+q,c*,, doing?
This is a variant on “Just another Perl hacker”, a Perl meme. As JAPHs go, this one is relatively tame.
The first thing you need to do is figure out how to parse the perl program. It lacks parentheses around function calls and uses the + and quote-like operators in interesting ways. The original program is this:
print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34
pack is a function, whereas print and map are list operators. Either way, a function or non-nullary operator name immediately followed by a plus sign can't be using + as a binary operator, so both + signs at the beginning are unary operators. This oddity is described in the manual.
If we add parentheses, use the block syntax for map, and add a bit of whitespace, we get:
print(+pack(+q,c*,,
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
The next tricky bit is that q here is the q quote-like operator. It's more commonly written with single quotes:
print(+pack(+'c*',
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
Remember that the unary plus is a no-op (apart from forcing a scalar context), so things should now be looking more familiar. This is a call to the pack function, with a format of c*, meaning “any number of characters, specified by their number in the current character set”. An alternate way to write this is
print(join("", map {chr($.+=$_)} (74, …, -34)))
The map function applies the supplied block to the elements of the argument list in order. For each element, $_ is set to the element value, and the result of the map call is the list of values returned by executing the block on the successive elements. A longer way to write this program would be
#list_accumulator = ();
for $n in (74, …, -34) {
$. += $n;
push #list_accumulator, chr($.)
}
print(join("", #list_accumulator))
The $. variable contains a running total of the numbers. The numbers are chosen so that the running total is the ASCII codes of the characters the author wants to print: 74=J, 74+43=117=u, 74+43-2=115=s, etc. They are negative or positive depending on whether each character is before or after the previous one in ASCII order.
For your next task, explain this JAPH (produced by EyesDrop).
''=~('(?{'.('-)#.)#_*([]#!#/)(#)#-#),#(##+#)'
^'][)#]`}`]()`#.#]#%[`}%[#`#!##%[').',"})')
Don't use any of this in production code.
The basic idea behind this is quite simple. You have an array containing the ASCII values of the characters. To make things a little bit more complicated you don't use absolute values, but relative ones except for the first one. So the idea is to add the specific value to the previous one, for example:
74 -> J
74 + 43 -> u
74 + 42 + (-2 ) -> s
Even though $. is a special variable in Perl it does not mean anything special in this case. It is just used to save the previous value and add the current element:
map($.+=$_, ARRAY)
Basically it means add the current list element ($_) to the variable $.. This will return a new array with the correct ASCII values for the new sentence.
The q function in Perl is used for single quoted, literal strings. E.g. you can use something like
q/Literal $1 String/
q!Another literal String!
q,Third literal string,
This means that pack+q,c*,, is basically pack 'c*', ARRAY. The c* modifier in pack interprets the value as characters. For example, it will use the value and interpret it as a character.
It basically boils down to this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_value = 0;
my #relative = (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34);
my #absolute = map($prev_value += $_, #relative);
print pack("c*", #absolute);
my $mind = ( 'a', 'little', 'confused' );
And it's because perldoc perlfaq4 explains the line above as follows (emphasis added):
Since you're assigning to a scalar, the righthand side is in scalar
context. The comma operator (yes, it's an operator!) in scalar context
evaluates its lefthand side, throws away the result, and evaluates
it's righthand side and returns the result. In effect, that
list-lookalike assigns to $scalar it's rightmost value. Many people
mess this up because they choose a list-lookalike whose last element
is also the count they expect:
my $scalar = ( 1, 2, 3 ); # $scalar gets 3, accidentally
What I understand this to mean is that there is no such thing as a list in scalar context.
However, ikegami maintains that it "result[s] in a list operator, so it is a list literal."
So, is it a list or not?
A list literal is something that is actually a list written out in code, so (1, 2, 3) is a list literal, whereas caller for example is a function that could return a list or a scalar depending on context.
In a line like:
my $x = ...;
the ... sees scalar context, so if ... was a list literal, then you would have a list literal in scalar context:
my $x = (1, 2, 3);
but the list literal does not result in a list, because the comma operator it contains sees scalar context, which then results in it returning the last item of the list literal, and throwing the remaining values away after evaluating them.
In terms of a function, the function itself sees whatever context it is called from, which then is propagated to any line in that function that returns. So you can have a function in scalar, list, or void context, and if the last line of that sub happens to be a list literal, that list literal will see any of those contexts and will behave appropriately.
So basically this is a terminology distinction, with list literal referring to a comma separated list of values in the actual source code*, and list referring to a sequence of values placed onto perl's stack.
You can write subroutines with return values that either behave like arrays or like list literals with regard to context.
sub returns_like_array {my #x = 1..5; return #x}
sub returns_like_list {my #x = 1..5; return #x[0 .. $#x]}
*or something that results in a list of comma separated values, like a qw() or a fat comma => or a hash or array slice.
You can also look at my answer here: How do I get the first item from a function that returns an array in Perl? which goes into a bit more detail about lists.
I agree with you and with perlfaq4. To quote perldata, which is probably the definitive documentation on this point:
In a context not requiring a list value, the value of what appears to be a list literal is simply the value of the final element, as with the C comma operator.
(emphasis mine).
That said, ikegami is entitled to use whatever terminology (s)he wants. Different people think of different language constructs in different terms, and as long as the end results are the same, I don't think it matters if their terminology differs from that in the documentation. (It's not a great idea to insist on idiosyncratic terminology in a public forum, though!)
Just ask Perl!
>perl -MO=Concise,-exec -e"my $s = ($x, $y, $z);"
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <0> pushmark v
4 <#> gvsv[*x] s
5 <#> gvsv[*y] s
6 <#> gvsv[*z] s
7 <#> list sKP <--- list in (s)calar context.
8 <0> padsv[$s:1,2] sRM*/LVINTRO
9 <2> sassign vKS/2
a <#> leave[1 ref] vKP/REFC
So yes, one can have a list in scalar context.
One cannot return a list in scalar context. (Well, it is possible from XS, but the program will crash, probably with a "Bizarre copy" error.)
It can't be any other way. If there was a distinct list op and comma op, it would be impossible to compile the following:
sub f { "a", "b", "c" }
Would that result in a list op or a comma op? In reality, there is no such distinction, so yes, it results in that op.
It's not a list. Since you're assigning to a scalar, it's the comma operator - evaluates the left side, evaluates the right side, returns the right side, and resolves left-to-right. $scalar will be 'confused' :)
It's a matter of terminology and perspective. Terminology is tricky here because you can write a list in source code, return a list of values, or evaluate something in list context.. and the word "list" means something subtly different in each case!
Technically, a list cannot exist in scalar context because perl (the interpreter) does not create a list of values for things like this:
my $x = ('a', 'b', 'c');
When being pedantic precise this is explained as the behavior of the comma operator in scalar context. Similarly, qw is defined as returning the last element in scalar context so there's no list here, either:
my $x = qw(a b c);
In both of these cases (and for any other operator examples that we could muster up) what we're saying is that the interpreter doesn't create a list of values. That's an implementation detail. There's no reason that perl couldn't create a list of values and throw away all but the last one; it just chooses not to.
Perl the language is abstract. At a source code level ('a', 'b', 'c') is a list. You can use that list in an expression that imposes scalar context on it, so from that perspective a list can exist in scalar context.
In the end, it's a choice between mental models of how [Pp]erl operates. The "A list in scalar context returns its last value" model is slightly inaccurate but easier to grok. As far as I know there aren't any corner cases where it is functionally incorrect. The "There's no such thing as a list in scalar context" model is more accurate but harder to work with as you need to consider the behavior of every operator in scalar context.
If you believe in Lists in Scalar Context, Clap your Hands
I got this strange line of code today, it tells me 'empty' or 'not empty' depending on whether the CWD has any items (other than . and ..) in it.
I want to know how it works because it makes no sense to me.
perl -le 'print+(q=not =)[2==(()=<.* *>)].empty'
The bit I am interested in is <.* *>. I don't understand how it gets the names of all the files in the directory.
It's a golfed one-liner. The -e flag means to execute the rest of the command line as the program. The -l enables automatic line-end processing.
The <.* *> portion is a glob containing two patterns to expand: .* and *.
This portion
(q=not =)
is a list containing a single value -- the string "not". The q=...= is an alternate string delimiter, apparently used because the single-quote is being used to quote the one-liner.
The [...] portion is the subscript into that list. The value of the subscript will be either 0 (the value "not ") or 1 (nothing, which prints as the empty string) depending on the result of this comparison:
2 == (()=<.* *>)
There's a lot happening here. The comparison tests whether or not the glob returned a list of exactly two items (assumed to be . and ..) but how it does that is tricky. The inner parentheses denote an empty list. Assigning to this list puts the glob in list context so that it returns all the files in the directory. (In scalar context it would behave like an iterator and return only one at a time.) The assignment itself is evaluated in scalar context (being on the right hand side of the comparison) and therefore returns the number of elements assigned.
The leading + is to prevent Perl from parsing the list as arguments to print. The trailing .empty concatenates the string "empty" to whatever came out of the list (i.e. either "not " or the empty string).
<.* *>
is a glob consisting of two patterns: .* are all file names that start with . and * corresponds to all files (this is different than the usual DOS/Windows conventions).
(()=<.* *>)
evaluates the glob in list context, returning all the file names that match.
Then, the comparison with 2 puts it into scalar context so 2 is compared to the number of files returned. If that number is 2, then the only directory entries are . and .., period. ;-)
<.* *> means (glob(".*"), glob("*")). glob expands file patterns the same way the shell does.
I find that the B::Deparse module helps quite a bit in deciphering some stuff that throws off most programmers' eyes, such as the q=...= construct:
$ perl -MO=Deparse,-p,-q,-sC 2>/dev/null << EOF
> print+(q=not =)[2==(()=<.* *>)].empty
> EOF
use File::Glob ();
print((('not ')[(2 == (() = glob('.* *')))] . 'empty'));
Of course, this doesn't instantly produce "readable" code, but it surely converts some of the stumbling blocks.
The documentation for that feature is here. (Scroll near the end of the section)