What is `null vK` in Perl? - perl

Using Perl, I have two similar syntax's,
if ($a && $b) { exit() }
do { exit() } if ($a && $b)
These I believe are supposed to be the same thing, however the top one creates a null vK opcode,
<1> null vK*/1 ->-
What is the significance of null vK and what does it do?
$ perl -MO=Concise -e'if ($a && $b) { exit() }'
8 <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
- <1> null vK/1 ->8
6 <|> and(other->7) vK/1 ->8
- <1> null sK/1 ->6
4 <|> and(other->5) sK/1 ->8
- <1> ex-rv2sv sK/1 ->4
3 <#> gvsv[*a] s ->4
- <1> ex-rv2sv sK/1 ->-
5 <#> gvsv[*b] s ->6
- <#> scope vK ->-
- <;> ex-nextstate(main 3 -e:1) v ->7
7 <0> exit v* ->8
-e syntax OK
Verses the follow,
$ perl -MO=Concise -e'do { exit() } if ($a && $b)'
8 <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
- <1> null vK/1 ->8
6 <|> and(other->7) vK/1 ->8
- <1> null sKP/1 ->6
4 <|> and(other->5) sK/1 ->8
- <1> ex-rv2sv sK/1 ->4
3 <#> gvsv[*a] s ->4
- <1> ex-rv2sv sK/1 ->-
5 <#> gvsv[*b] s ->6
- <1> null vK*/1 ->-
- <#> scope vK ->-
- <;> ex-nextstate(main 2 -e:1) v ->7
7 <0> exit v* ->8

The "-" at the beginning of the line indicates the op won't get executed, which can also be seen using perl -MO=Concise,-exec.
That said, the null vK... or null sK... opcodes in B::Concise output doesn't mean that some ops have been optimized away. perldoc on B::Concise says clearly that such optimization is indicated by ex- in the output:
Nullops appear as "ex-opname", where opname is an op that has been
optimized away by perl. They're displayed with a sequence-number of '-',
because they are not executed (they don't appear in previous example),
they're printed here because they reflect the parse.
For example:
> perl -MO=Concise -e "$a"
4 <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
- **<1> ex-rv2sv vK/1 ->4**
3 <#> gvsv[*a] s ->4
So what are those nulls then?
They are the genuine nulls that come from the yacc grammar Perl uses to parse code, and they have not been intended for execution from the beginning.
In your case the excess null vk comes directly from the following do BLOCK grammar rule (perly.y):
termdo : DO term %prec UNIOP /* do $filename */
{ $$ = dofile($2, $1);}
| DO block %prec '(' /* do { code */
{ $$ = newUNOP(OP_NULL, OPf_SPECIAL, op_scope($2));}
;
We can see it here:
>perl -MO=Concise -e "do{}"
4 <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 2 -e:1) v:{ ->3
- <1> null vK*/1 ->4
- <#> scope vK ->-
3 <0> stub v ->4
In other cases nulls come from yacc actions.
Apparently, those nulls are used to help manage the op-tree, and since they are never executed, I think Perl developers don't bother about their presence.
Here is an example of null op arising from parsing of a boolean expression:
>perl -MO=Concise -e "$a||$b"
6 <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
- <1> null vK/1 ->6
4 <|> or(other->5) vK/1 ->6
- <1> ex-rv2sv sK/1 ->4
3 <#> gvsv[*a] s ->4
- <1> ex-rv2sv vK/1 ->-
5 <#> gvsv[*b] s ->6
Why is a null here? Another snippet helps make it clear:
>perl -MO=Concise -e "!$a&&!$b"
7 <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
6 <1> not vK/1 ->7
4 <|> or(other->5) sK/1 ->6
- <1> ex-not sK/1 ->4
- <1> ex-rv2sv sK/1 ->-
3 <#> gvsv[*a] s ->4
- <1> ex-not sK/1 ->6
- <1> ex-rv2sv sK/1 ->-
5 <#> gvsv[*b] s ->6
It appears that null vK has become not vK.
Looking carefully, we can see that Perl optimized !$a&&!$b into !($a||$b) with not(!) taking place of null.
It turns out that Perl always reserves a parent opcode for logical expressions, and if an expression can be simplified with outer not
Perl puts not into the parent opcode, and null otherwise.
To summarize: NULL opcodes indicated by ex- in B::Concise output are made by the optimizer, and NULL opcodes indicated by null come from the grammar parser. Both of them are never executed and carry no performance penalties.

Related

Perl increment operator

$a = 10;
$b = (++$a) + (++$a) + (++$a);
print $b;
I am getting the answer 37.
Can anybody explain how this operation is proceeding and how the result is getting 37.
As per my logic it should be 36:
(++$a) + (++$a) + (++$a)
11 + 12 + 13 = 36
But I am getting the answer 37
Perl's is executing this as
( ( $a = $a + 1 ) + ( $a = $a + 1 ) ) + ( $a = $a + 1 )
You have even put the ++$a in parentheses so to say that they should happen first, before the additions, although they are of higher priority anyway
This is centred around the fact that the assignment operator = returns its first operand, which allows operations like
(my $x = $y) =~ tr/A-Z/a-z/
If the result of the assignment were simply the value copied from $y to $x then the tr/// would cause a Can't modify a constant item or the equivalent, and it would have no effect on what was stored in either variable
Here is the variable $a, and the execution is as follows
Execute the first increment, returning $a
$a is now 11
Execute the second increment, returning $a again
$a is now 12
Execute the first addition, which adds what was returned by the two increments—both $a
$a is 12, so $a + $a is 24
Execute the third increment, returning $a again
$a is now 13
Execute the second addition, which adds the what was returned by the first addition (24) and the third increment ($a)
$a is 13, so 24 + $a is 37
Note that this should not be relied on. It is not documented anywhere except to say that it us undefined, and the behaviour could change with any release of Perl
As a complement to mob and Borodin's answer, you can see what's happening clearly if you think about how the operations are interacting with the stack and recognize that preinc returns the variable, not its value.
op | a's value | stack
$a | 10 | $a
++ | 11 | $a
$a | 11 | $a $a
++ | 12 | $a $a
+ | 12 | 24
$a | 12 | 24 $a
++ | 13 | 24 $a
+ | 13 | 37
As it has been noted in comments, changing a variable multiple times within a single statement leads to undefined behavior, as explained in perlop.
So the exact behavior is not specified and may vary between versions and implementations.
As to how it works out, here is one way to see it. Since + is a binary operator, at each operation its left-hand-side operand does get involved when ++ is executed on the other. So at each position $a gets ++ed, and picks up another increment as a LHS operand.
That means that the LHS $a gets incremented additionally (to its ++) once in each + operation. The + operations after the first one must accumulate these, one extra for each extra term. With three terms here that's another +3, once. So there are altogether 7 increments.
Yet another (fourth) term incurs an extra +4, etc
perl -wE'$x=10; $y = ++$x + ++$x + ++$x + ++$x; say $y' # 4*10 + 2+2+3+4
This is interesting to tweak by changing ++$x to $x++ -- the effect depends on position.
Increments in steps
first $a gets incremented (to 11)
in the first addition, as the second $a is incremented (to 11) the first one gets a bump as well being an operand (to 12)
in the second addition, the second $a gets incremented (to 12) as an operand
as the second addition comes up, the third $a is updated and thus picks up increments from both additions, plus its increment (to 13)
The enumeration of $a above refers to their presence at multiple places in the statement.
As #Håkon Hægland pointed out, running this code under B::Concise, which outputs the opcodes that the Perl script generates, is illuminating. Here's are two slightly different examples than the one you provided:
$ perl -E 'say $b=$a + ((++$a)+(++$a))'
6
$ perl -E 'say $b=($a+(++$a)) + (++$a)'
4
So what's going on here? Let's look at the opcodes:
$ perl -MO=Concise -E 'say $b=$a+((++$a)+(++$a))'
e <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 47 -e:1) v:%,{,469764096 ->3
d <#> say vK ->e
3 <0> pushmark s ->4
c <2> sassign sKS/2 ->d
a <2> add[t6] sK/2 ->b
- <1> ex-rv2sv sK/1 ->5
4 <#> gvsv[*a] s ->5
9 <2> add[t5] sKP/2 ->a
6 <1> preinc sKP/1 ->7
- <1> ex-rv2sv sKRM/1 ->6
5 <#> gvsv[*a] s ->6
8 <1> preinc sKP/1 ->9
- <1> ex-rv2sv sKRM/1 ->8
7 <#> gvsv[*a] s ->8
- <1> ex-rv2sv sKRM*/1 ->c
b <#> gvsv[*b] s ->c
-e syntax OK
There are no conditionals in this program. The left most column indicates the order of operations in this program. Whereever you see the ex-rv2sv token, that is where Perl is reading the value of an expression like a global scalar variable.
The preinc operations occur at labels 6 and 8. The add operations occur at labels 9 and a. This tells us that both increments occurred before Perl performed the additions, and so the final expression would be something like 2 + (2 + 2) = 6.
In the other example, the opcodes look like
$ perl -MO=Concise -E 'say $b=($a+(++$a)) + (++$a)'
e <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 47 -e:1) v:%,{,469764096 ->3
d <#> say vK ->e
3 <0> pushmark s ->4
c <2> sassign sKS/2 ->d
a <2> add[t6] sK/2 ->b
7 <2> add[t4] sKP/2 ->8
- <1> ex-rv2sv sK/1 ->5
4 <#> gvsv[*a] s ->5
6 <1> preinc sKP/1 ->7
- <1> ex-rv2sv sKRM/1 ->6
5 <#> gvsv[*a] s ->6
9 <1> preinc sKP/1 ->a
- <1> ex-rv2sv sKRM/1 ->9
8 <#> gvsv[*a] s ->9
- <1> ex-rv2sv sKRM*/1 ->c
b <#> gvsv[*b] s ->c
-e syntax OK
Now the preinc operations still occur at 6 and 9, but there is an add operation at label 7, after $a has only be incremented one time. This makes the values used in the final expression (1 + 1) + 2 = 4.
So in your example:
$ perl -MO=Concise -E '$a=10;$b=(++$a)+(++$a)+(++$a);say $b'
l <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 47 -e:1) v:%,{,469764096 ->3
5 <2> sassign vKS/2 ->6
3 <$> const[IV 10] s ->4
- <1> ex-rv2sv sKRM*/1 ->5
4 <#> gvsv[*a] s ->5
6 <;> nextstate(main 47 -e:1) v:%,{,469764096 ->7
g <2> sassign vKS/2 ->h
e <2> add[t7] sK/2 ->f
b <2> add[t5] sK/2 ->c
8 <1> preinc sKP/1 ->9
- <1> ex-rv2sv sKRM/1 ->8
7 <#> gvsv[*a] s ->8
a <1> preinc sKP/1 ->b
- <1> ex-rv2sv sKRM/1 ->a
9 <#> gvsv[*a] s ->a
d <1> preinc sKP/1 ->e
- <1> ex-rv2sv sKRM/1 ->d
c <#> gvsv[*a] s ->d
- <1> ex-rv2sv sKRM*/1 ->g
f <#> gvsv[*b] s ->g
h <;> nextstate(main 47 -e:1) v:%,{,469764096 ->i
k <#> say vK ->l
i <0> pushmark s ->j
- <1> ex-rv2sv sK/1 ->k
j <#> gvsv[*b] s ->k
-e syntax OK
We see preinc occurring at labels 8, a, and d. The add operations occur at b and e. That is, $a is incremented twice, then two $a's are added together. Then $a is incremented again. Then $a is added to the result. So the output is (12 + 12) + 13 = 37.

Find the C APIs used for perl script

Trying to understand the C code that's behind a perl script. For example, the following contrived code:
$name = "john";
$greeting = "hi $name, how old are you?";
if ($greeting =~ /hi (\S+)/) {
$b = $1;
print "got $b as expected\n";
}
Would like to know how the variable $name is substituted in $greeting string, also would like to know what c API is used for the regular expression match.
I heard something like perl -MO=Bytecode,-H test.pl where test.pl has the above content, but the output is bindary.
There isn't a direct mapping of Perl code to C code. Instead, Perl is a bytecode compiler. What you can get is the bytecode, the tree of opcodes. There's several modules to get this in a human readable form, one is B::Concise.
perl -MO=Concise test.pl
Produces this...
w <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 test.plx:1) v:{ ->3
5 <2> sassign vKS/2 ->6
3 <$> const[PV "john"] s ->4
- <1> ex-rv2sv sKRM*/1 ->5
4 <#> gvsv[*name] s ->5
6 <;> nextstate(main 1 test.plx:2) v:{ ->7
d <2> sassign vKS/2 ->e
- <1> ex-stringify sK/1 ->c
- <0> ex-pushmark s ->7
b <2> concat[t5] sKS/2 ->c
9 <2> concat[t4] sK/2 ->a
7 <$> const[PV "hi "] s ->8
- <1> ex-rv2sv sK/1 ->9
8 <#> gvsv[*name] s ->9
a <$> const[PV ", how old are you?"] s ->b
- <1> ex-rv2sv sKRM*/1 ->d
c <#> gvsv[*greeting] s ->d
The documentation for B::Concise explains all this. This tells you the operator sequence, type, name, flags, and the next op in the sequence. For example...
7 <$> const[PV "hi "] s ->8
This is operator 7, it is an SVOP (it applies to scalars), its name is "const" and it's for the scalar string (PV) "hi ", it's in scalar context, and the next operator is 8.
More about operators can be learned from perlguts and the Illustrated Perl Guts and by poking around in the Perl source code. Each operator has a C function associated with it called pp_OPNAME so to find the "const" operator look for pp_const.
The Perl regular expression engine is completely custom and has its own perlreguts documentation.

Perl concatenate operator vs. append operator

First, here is the example straight from Learning Perl (p.29)
# Append a space to $str
$str = $str . " ";
# The same thing with an assignment operator
$str .= " ";
Are either of these method more "correct" or preferred for speed or syntactical reasons?
Looking at the Concise output for each option:
perl -MO=Concise,-exec -e 'my $str = "a"; $str = $str . " ";'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <$> const[PV "a"] s
4 <0> padsv[$str:1,2] sRM*/LVINTRO
5 <2> sassign vKS/2
6 <;> nextstate(main 2 -e:1) v:{
7 <0> padsv[$str:1,2] s
8 <$> const[PV " "] s
9 <2> concat[$str:1,2] sK/TARGMY,2
a <#> leave[1 ref] vKP/REFC
-e syntax OK
perl -MO=Concise,-exec -e 'my $str = "a"; $str .= " ";'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <$> const[PV "a"] s
4 <0> padsv[$str:1,2] sRM*/LVINTRO
5 <2> sassign vKS/2
6 <;> nextstate(main 2 -e:1) v:{
7 <0> padsv[$str:1,2] sRM
8 <$> const[PV " "] s
9 <2> concat[t2] vKS/2
a <#> leave[1 ref] vKP/REFC
-e syntax OK
While they are slightly different (.= does concatenation in a void context, and the other in scalar) the main reason to choose one or the other is style/maintainability. I prefer to write:
$str .= " ";
Mainly for ease of typing and because it's obvious you're appending to the end of string without having to check the variable on the RHS is the same as on the LHS.
Essentially: Use whichever you prefer!

Is there any difference between &$func($arg) and $func->($arg)?

While trying to understand closures, reading thru perl-faq and coderef in perlref found those examples:
sub add_function_generator {
return sub { shift() + shift() };
}
my $add_sub = add_function_generator();
my $sum = $add_sub->(4,5);
and
sub newprint {
my $x = shift;
return sub { my $y = shift; print "$x, $y!\n"; };
}
$h = newprint("Howdy");
&$h("world");
here are two forms of calling a function stored in a variable.
&$func($arg)
$func->($arg)
Are those totally equivalent (only syntactically different) or here are some differences?
There is no difference. Proof: the opcodes generated by each version:
$ perl -MO=Concise -e'my $func; $func->()'
8 <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
3 <0> padsv[$func:1,2] vM/LVINTRO ->4
4 <;> nextstate(main 2 -e:1) v:{ ->5
7 <1> entersub[t2] vKS/TARG ->8
- <1> ex-list K ->7
5 <0> pushmark s ->6
- <1> ex-rv2cv K ->-
6 <0> padsv[$func:1,2] s ->7
$ perl -MO=Concise -e'my $func; &$func()'
8 <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
3 <0> padsv[$func:1,2] vM/LVINTRO ->4
4 <;> nextstate(main 2 -e:1) v:{ ->5
7 <1> entersub[t2] vKS/TARG ->8
- <1> ex-list K ->7
5 <0> pushmark s ->6
- <1> ex-rv2cv sKPRMS/4 ->-
6 <0> padsv[$func:1,2] s ->7
… wait, there are actually slight differences in the flags for - <1> ex-rv2cv sKPRMS/4 ->-. Anyways, they don't seemt to matter, and both forms behave the same.
But I would recommend to use the form $func->(): I perceive this syntax as more elegant, and you can't accidentally forget to use parens (&$func works but makes the current #_ visible to the function, which is not what you'd usually want).

Difference between $x->{a}{b} and $x->{a}->{b}

Is there any conceivable difference between
$x->{a}{b}
and
$x->{a}->{b}
for any allowed value of $x->{a}, in any of the perl versions >= 5.6?
No. This is just a syntactic shortcut without any semantic difference.
Proof: the opcodes that are produced upon compilation
$ perl -MO=Concise -e'$x->{a}{b}'
b <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
a <2> helem vK/2 ->b
8 <1> rv2hv[t3] sKR/1 ->9
7 <2> helem sKM/DREFHV,2 ->8
5 <1> rv2hv[t2] sKR/1 ->6
4 <1> rv2sv sKM/DREFHV,1 ->5
3 <#> gv[*x] s ->4
6 <$> const[PV "a"] s/BARE ->7
9 <$> const[PV "b"] s/BARE ->a
-e syntax OK
$ perl -MO=Concise -e'$x->{a}->{b}'
b <#> leave[1 ref] vKP/REFC ->(end)
1 <0> enter ->2
2 <;> nextstate(main 1 -e:1) v:{ ->3
a <2> helem vK/2 ->b
8 <1> rv2hv[t3] sKR/1 ->9
7 <2> helem sKM/DREFHV,2 ->8
5 <1> rv2hv[t2] sKR/1 ->6
4 <1> rv2sv sKM/DREFHV,1 ->5
3 <#> gv[*x] s ->4
6 <$> const[PV "a"] s/BARE ->7
9 <$> const[PV "b"] s/BARE ->a
-e syntax OK
See also perlref, section Using References, rule 3.
There are some places where -> didn't become optional until later than 5.6. I believe these are some:
$x->('a'){'b'} # coderef called, returning a reference
({a=>42})[0]{'a'} # reference from a list slice
The constructs are identical. Perl allows the -> between any pair of closing and opening brackets to be omitted.
This works, and prints OKOK.
use strict;
use warnings;
my $data = {
a => {
b => [
sub { { c => 'OK' } }
]
}
};
print $data->{a}->{b}->[0]->()->{c};
print $data->{a}{b}[0](){'c'};