Passing an array reference to a perl callback using call_sv(), should I use newRV() or newRV_noinc()? - perl

I have this XS code (XsTest.xs):
#define PERL_NO_GET_CONTEXT
#include "EXTERN.h"
#include "perl.h"
#include "XSUB.h"
MODULE = My::XsTest PACKAGE = My::XsTest
PROTOTYPES: DISABLE
void
foo( callback )
SV *callback
PREINIT:
AV *array;
SSize_t array_len;
SV **sv_ptr;
SV *sv;
double value;
CODE:
if ( !SvROK(callback) ) {
croak("Not a reference!");
}
if ( SvTYPE(SvRV(callback)) != SVt_PVCV ) {
croak("Not a code reference!");
}
/* This array will go out of scope (and be freed) at the end of this XSUB
* due to the sv_2mortal()
*/
array = (AV *)sv_2mortal((SV *)newAV()); /* Line #28 */
/* NOTE: calling dSP is not necessary for an XSUB, since it has
* already been arranged for by xsubpp by calling dXSARGS
*/
printf( "Line #28: SvREFCNT(array) = %d\n", SvREFCNT(array));
ENTER;
SAVETMPS;
PUSHMARK(SP);
EXTEND(SP, 1);
/* Should I use newRV_inc() or newRV_noinc() here? Or does it not
* matter?
* NOTE: XPUSHs mortalizes the RV (so we do not need to call sv_2mortal()
*/
XPUSHs((SV *)newRV_inc((SV *) array)); /* Line #41 */
printf( "Line #41: SvREFCNT(array) = %d\n", SvREFCNT(array));
PUTBACK;
call_sv(callback, G_VOID);
printf( "Line #45: SvREFCNT(array) = %d\n", SvREFCNT(array)); /* Line #45: */
array_len = av_top_index(array) + 1;
printf( "Array length: %ld\n", array_len );
if ( array_len != 1 ) {
croak( "Unexpected array size: %ld", array_len );
}
sv_ptr = av_fetch( array, 0, 0 );
sv = *sv_ptr;
if (SvTYPE(sv) >= SVt_PVAV) {
croak("Not a scalar value!");
}
value = SvNV(sv);
printf( "Returned value: %g\n", value);
FREETMPS; /* Line # 58 */
LEAVE;
printf( "Line #60: SvREFCNT(array) = %d\n", SvREFCNT(array));
I trying to figure out whether to use newRV_inc() or newRV_noinc() at line #41.
The Perl callback is defined in the test script p.pl:
use strict;
use warnings;
use ExtUtils::testlib;
use My::XsTest;
sub callback {
my ( $ar ) = #_;
$ar->[0] = 3.12;
}
My::XsTest::foo( \&callback );
The output from running p.pl is:
Line #28: SvREFCNT(array) = 1
Line #41: SvREFCNT(array) = 2
Line #45: SvREFCNT(array) = 2
Array length: 1
Returned value: 3.12
Line #60: SvREFCNT(array) = 2
As far as I can see, if I use newRV_inc():
The reference count of array is set to 1 at line #28 when calling newAV(),
then it is decreased to zero also at line #28 when calling sv_2mortal() on the same array,
at line #41 I create a reference using newRV_inc() and the reference count of array is increased back again from 0 to 1 (due to the _inc in newRV_inc()),
at line #58, the FREETMPS macro is called, but this does not affect (?) the refcount of array since it was created outside the SAVETMPS boundary we set up for the callback temporaries. On the other hand, the reference we pushed at line #41 is freed here (since it was made mortal), which causes it to relinquish its ownership of array and hence the reference count of array will be decreased to zero again.
at line #60, the XSUB exits and array will be freed (on a following call by the perl runop loop to FREETMPS) since it has reference count of zero. All the scalars in the array will also be freed at that point (?).
The problem with the above reasoning is that it does not agree with the output from SvREFCNT() as shown above. According to the output, the reference count of array is 2 (and not 1) at exit.
What is going on here?
Additional files to reproduce:
lib/My/XsTest.pm:
package My::XsTest;
use strict;
use warnings;
use Exporter qw(import);
our %EXPORT_TAGS = ( 'all' => [ qw( ) ] );
our #EXPORT_OK = ( #{ $EXPORT_TAGS{'all'} } );
our #EXPORT = qw( );
our $VERSION = 0.01;
require XSLoader;
XSLoader::load();
1;
Makefile.PL:
use 5.028001;
use strict;
use warnings;
use utf8;
use ExtUtils::MakeMaker 7.12; # for XSMULTI option
WriteMakefile(
NAME => 'My::XsTest',
VERSION_FROM => 'lib/My/XsTest.pm',
PREREQ_PM => { 'ExtUtils::MakeMaker' => '7.12' },
ABSTRACT_FROM => 'lib/My/XsTest.pm',
AUTHOR => 'Håkon Hægland <hakon.hagland#gmail.com>',
OPTIMIZE => '', # e.g., -O3 (for optimize), -g (for debugging)
XSMULTI => 0,
LICENSE => 'perl',
LIBS => [''], # e.g., '-lm'
DEFINE => '', # e.g., '-DHAVE_SOMETHING'
INC => '-I.', # e.g., '-I. -I/usr/include/other'
)
Compilation
To compile the module, run:
perl Makefile.PL
make

You should use newRV_inc().
Your actual problem is that you are creating a new RV which leaks. The fact that the RV is never freed means that the reference count on array is never decremented. You need to mortalise the return value of newRV_inc().
One other comment: the reference count of array is not reduced to zero when you mortalise it; it remains as 1. I'm not sure where you got that idea from. What actually happens is that when you call newAV(), you are given an AV with a reference count of one, which is 1 too high. Left as-is, it will leak. sv_2mortal() doesn't change array's ref count, but it does take ownership of one reference, which "corrects" the overall reference count and array will no longer leak.

Related

Why does Test::LeakTrace say this Perl code is leaking memory?

Test::LeakTrace says code leaking. I don't understand.
I do not understand output of Test::LeakTrace, and it is too long for this post. Some leaks from Test system, but others? No.
This is the code.
use 5.26.0;
use warnings;
use Test::More;
use Test::LeakTrace;
sub spawn {
my %methods = #_;
state $spawned = 1;
my $object = bless {}, "Spawned::Class$spawned";
$spawned++;
while ( my ( $method, $value ) = each %methods ) {
no strict 'refs';
*{ join '::', ref($object), $method } = sub { $value };
}
return $object;
}
no_leaks_ok {
my $spawn = spawn( this => 2 );
is( $spawn->this, 2 );
}
'no leaks';
done_testing;
I get weird things like this:
# leaked SCALAR(0x7f9b41a069c0) from leak.pl line 11.
# 10: $spawned++;
# 11: while ( my ( $method, $value ) = each %methods ) {
# 12: no strict 'refs';
# SV = IV(0x7f9b41a069b0) at 0x7f9b41a069c0
# REFCNT = 1
# FLAGS = (IOK,pIOK)
# IV = 2
And this:
# leaked GLOB(0x7f9b411b22a0) from leak.pl line 9.
# 8: state $spawned = 1;
# 9: my $object = bless {}, "Spawned::Class$spawned";
# 10: $spawned++;
# SV = PVGV(0x7f9b41a29530) at 0x7f9b411b22a0
# REFCNT = 1
# FLAGS = (MULTI)
# NAME = "Class4::"
# NAMELEN = 8
# GvSTASH = 0x7f9b4081a0a8 "Spawned"
# FLAGS = 0x2
# GP = 0x7f9b40534720
# SV = 0x0
# REFCNT = 1
# IO = 0x0
# FORM = 0x0
# AV = 0x0
# HV = 0x7f9b41a24730
# CV = 0x0
# CVGEN = 0x0
# GPFLAGS = 0x0 ()
# LINE = 9
# FILE = "leak.pl"
# EGV = 0x7f9b411b22a0 "Class4::"
Nothing makes sense to me. Reference counts are 1.
Your code leaks. They are intentional leaks, but leaks nonetheless.
You create a package that is never freed.[1] In it you create a glob that is never freed. To this glob you assign a sub that is never freed. The sub captures a variable, so it's never freed.
The module is doing its job and telling you about this.
I encountered a few surprises confirming the above was what was happening. The rest of this answer identifies them and explains them.
I'll be using this program (a.pl):
use 5.010;
use Test::More tests => 1;
use Test::LeakTrace;
sub f {
state $spawned = 1;
my $object = bless {}, "Spawned::Class$spawned" if $ARGV[0] & 1;
$spawned++ if $ARGV[0] & 2;
delete $Spawned::{"Class".($spawned-1)."::"} if $ARGV[0] & 4;
}
If we do $spawned++; but not the bless:
$ perl a.pl 1
1..1
ok 1 - leaks 0 <= 0
Expected.
If we do the bless but not $spawned++;:
$ perl a.pl 2
1..1
ok 1 - leaks 0 <= 0
huh!? We created global symbols. Shouldn't those be considered leaks? So why did the OP produce leaks, then? I'll come back to this.
If we do both:
$ perl a.pl 3
1..1
not ok 1 - leaks 8 <= 0
# Failed test 'leaks 8 <= 0'
# at a.pl line 11.
# '8'
# <=
# '0'
#
# [snip]
huh?! Why is it suddenly mentioning the global symbols we created?! I mean, it's what we expect, but we expected it above too. I'll come back to this.
Finally, we'll also undo the changes we made.
$ perl a.pl 7
1..1
ok 1 - leaks 0 <= 0
As expected, if we release the additions we made to the global symbol table, it no longer reports any leaks.
Now let's address the questions I raised.
Imagine if you had done something like
state $cache = { };
You wouldn't want that hash to be reported as a leak even though it's never freed. To that end, Test::LeakTrace evaluates the test block twice, ignoring leaks from the first call.
Leaked SVs are SVs which are not released after the end of the scope they have been created. These SVs include global variables and internal caches. For example, if you call a method in a tracing block, perl might prepare a cache for the method. Thus, to trace true leaks, no_leaks_ok() and leaks_cmp_ok() executes a block more than once.
That's why perl a.pl 2 didn't result in any reported leaks.
But perl a.pl 3 and the OP's code (intentionally) leak every time they are called, not just the first. Test::LeakTrace has no way to know those leaks are intentional, so you get what I suppose you could call false positives.
When I say "never freed", I mean "never freed until global destruction". Everything gets freed then.

Why I can get subroutine address before it is declared without error?

I have next program:
use warnings;
use strict;
BEGIN {
print \&mysub;
}
sub mysub {};
print \&mysub;
Its output:
CODE(0x118e890)CODE(0x118e890)
The BEGIN block is processed in compile time. At that point definition of sub mysub is not seen by compiler yet. But program still prints right subroutine address, which it will have when defined.
Why I do not get error here? Is this some sort of autovivification?
Yes, this is a form of autovivification. A stub is created when a reference to the sub is required and the sub doesn't exist.
use strict;
use warnings qw( all );
use feature qw( say );
sub test {
say defined(&mysub) ? "defined (".\&mysub.")"
: exists(&mysub) ? "exists (".\&mysub.")"
: "doesn't exist";
}
test();
my $ref = \&mysub;
test();
eval("sub mysub { } 1") or die($#);
test();
Output:
doesn't exist
exists (CODE(0xab8cd8))
defined (CODE(0xab8cd8))
Very interesting question. I'm writing this as an answer instead of a comment because it will be rather long, but there are still some bits I'm not entirely sure about.
I believe your intuition is correct and that it is a form of autovivification.
Devel::Peek can spread more light on what's happening.
I changed your code a little bit:
use warnings;
use strict;
use Devel::Peek;
$|++;
BEGIN {
Dump( \&mysub );
print \&mysub;
};
sub mysub {};
Dump( \&mysub );
print \&mysub;
I added $|++ so that buffering won't be cause of confusions, and added calls to Devel::Peek::Dump to look into the reference \&mysub. Here the output on my system:
SV = IV(0x2628628) at 0x2628638
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x26286e0
SV = PVCV(0x2640750) at 0x26286e0
REFCNT = 2
FLAGS = (DYNFILE)
COMP_STASH = 0x25ffdb0 "main"
ROOT = 0x0
GVGV::GV = 0x26287a0 "main" :: "mysub"
FILE = "/tmp/autov.pl"
DEPTH = 0
FLAGS = 0x1000
OUTSIDE_SEQ = 0
PADLIST = 0x0
OUTSIDE = 0x0 (null)
CODE(0x26286e0)SV = IV(0x25fff20) at 0x25fff30
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x26286e0
SV = PVCV(0x2640750) at 0x26286e0
REFCNT = 2
FLAGS = (DYNFILE)
COMP_STASH = 0x25ffdb0 "main"
START = 0x262ea50 ===> 1
ROOT = 0x262ea10
GVGV::GV = 0x26287a0 "main" :: "mysub"
FILE = "/tmp/autov.pl"
DEPTH = 0
FLAGS = 0x1000
OUTSIDE_SEQ = 371
PADLIST = 0x2648620
PADNAME = 0x2630180(0x2667f70) PAD = 0x2628770(0x262f020)
OUTSIDE = 0x2600140 (MAIN)
CODE(0x26286e0)
Note how Dump's output changes between the two calls.
The first time Dump is called we just have a reference to a empty scalar.
The second time, after the actual definition of the subroutine, you can see the details that pertain to subroutines have been fleshed out: namely PADLIST (now not null), PADNAME and START (I'm not an expert of Perl guts but I think this is the actual "pointer" to the subroutine).
I hope this helps. I'd be interested in knowing what you'll discover if you'll dig deeper in the problem.

Where does a Perl subroutine get values missing from the actual parameters?

I came across the following Perl subroutine get_billable_pages while chasing a bug. It takes 12 arguments.
sub get_billable_pages {
my ($dbc,
$bill_pages, $page_count, $cover_page_count,
$domain_det_page, $bill_cover_page, $virtual_page_billing,
$job, $bsj, $xqn,
$direction, $attempt,
) = #_;
my $billable_pages = 0;
if ($virtual_page_billing) {
my #row;
### Below is testing on the existence of the 11th and 12th parameters ###
if ( length($direction) && length($attempt) ) {
$dbc->xdb_execute("
SELECT convert(int, value)
FROM job_attribute_detail_atmp_tbl
WHERE job = $job
AND billing_sub_job = $bsj
AND xqn = $xqn
AND direction = '$direction'
AND attempt = $attempt
AND attribute = 1
");
}
else {
$dbc->xdb_execute("
SELECT convert(int, value)
FROM job_attribute_detail_tbl
WHERE job = $job
AND billing_sub_job = $bsj
AND xqn = $xqn
AND attribute = 1
");
}
$cnt = 0;
...;
But is sometimes called with only 10 arguments
$tmp_det = get_billable_pages(
$dbc2,
$row[6], $row[8], $row[7],
$domain_det_page, $bill_cover_page, $virtual_page_billing,
$job1, $bsj1, $row[3],
);
The function does a check on the 11th and 12th arguments.
What are the 11th and 12th arguments when the function is passed only 10 arguments?
Is it a bug to call the function with only 10 arguments because the 11th and 12th arguments end up being random values?
I am thinking this may be the source of the bug because the 12th argument had a funky value when the program failed.
I did not see another definition of the function which takes only 10 arguments.
The values are copied out of the parameter array #_ to the list of scalar variables.
If the array is shorter than the list, then the excess variables are set to undef. If the array is longer than the list, then excess array elements are ignored.
Note that the original array #_ is unmodified by the assignment. No values are created or lost, so it remains the definitive source of the actual parameters passed when the subroutine is called.
ikegami suggested that I should provide some Perl code to demonstrate the assignment of arrays to lists of scalars. Here is that Perl code, based mostly on his edit
use strict;
use warnings;
use Data::Dumper;
my $x = 44; # Make sure that we
my $y = 55; # know if they change
my #params = (8); # Make a dummy parameter array with only one value
($x, $y) = #params; # Copy as if this is were a subroutine
print Dumper $x, $y; # Let's see our parameters
print Dumper \#params; # And how the parameter array looks
output
$VAR1 = 8;
$VAR2 = undef;
$VAR1 = [ 8 ];
So both $x and $y are modified, but if there are insufficient values in the array then undef is used instead. It is as if the source array was extended indefinitely with undef elements.
Now let's look at the logic of the Perl code. undef evaluates as false for the purposes of conditional tests, but you apply the length operator like this
if ( length($direction) && length($attempt) ) { ... }
If you have use warnings in place as you should, Perl would normally produce a Use of uninitialized value warning. However length is unusual in that, if you ask for the length of an undef value (and you are running version 12 or later of Perl 5) it will just return undef instead of warning you.
Regarding "I did not see another definition of the function which takes only 10 arguments", Perl doesn't have function templates like C++ and Java - it is up to the code in the subroutine to look at what it has been passed and behave accordingly.
No, it's not a bug. The remaining arguments are "undef" and you can check for this situation
sub foo {
my ($x, $y) = #_;
print " x is undef\n" unless defined $x;
print " y is undef\n" unless defined $y;
}
foo(1);
prints
y is undef

Is it a good practice to use self invoking anonymous function in Perl?

It is a common practice to use self invoking anonymous functions to scope variables etc. in JavaScript:
;(function() {
...
})();
Is it a good practice to use such functions in Perl ?
(sub {
...
})->();
Or is it better for some reason to use main subroutine ?
sub main {
...
}
main();
Perl has lexical scoping mechanisms JS lacks. You are better off simply enclosing code you want scoped somehow in a block, e.g.:
{
my $localvar;
. . .
}
In this case $localvar will be completely invisible outside of those braces; that is also the same mechanism one can use to localise builtin variables such as $/:
{
local $/ = undef;
#reading from a file handle now consumes the entire file
}
#But not out here
(Side note: never set $/ globally. It can break things in subtle and horrible ways if you forget to set it back when you're done, or if you call other code before restoring it.)
In perl, the best practise is to put things in subs when it makes sense; when it doesn't make sense or unnecessarily complicates the code, lexical blocks ensure scoping; if you do need anonymous subroutines (generally for callbacks or similar) then you can do my $subref = sub { . . . }; or even just stick the sub declaration directly into a function argument: do_something(callback => sub { . . . });
Note: see also ysth's answer for a resource-related advantage to self-invoking anonymous subs.
Since perl provides lexically scoped variables (and, as of 5.18, lexical named subs), there is no scoping reason for doing that.
The only reason to do it that I can think of would be for memory management; if the sub in question is a closure (references at least one external lexical variable), any memory used by the sub will be totally freed instead of retained for reuse on the next call:
$ perl -MDevel::Peek -wle'sub { my $x; Dump $x; $x = 42 }->() for 1..2'
SV = NULL(0x0) at 0x944a88
REFCNT = 1
FLAGS = (PADMY)
SV = IV(0x944a78) at 0x944a88
REFCNT = 1
FLAGS = (PADMY)
IV = 42
$ perl -MDevel::Peek -wle'my $y; sub { $y if 0; my $x; Dump $x; $x = 42 }->() for 1..2'
SV = NULL(0x0) at 0x259d238
REFCNT = 1
FLAGS = (PADMY)
SV = NULL(0x0) at 0x259d220
REFCNT = 1
FLAGS = (PADMY)
Though if you are not concerned about memory, this would be a disadvantage.
It's not unheard of but not common either. To restrict variable scope temporarily, it's much more common to use a block with a my variable declaration:
...
{
my $local_variable;
...
}
In Javascript, self-invoking functions have two uses:
Variable scoping. The var declarations are hoisted into the scope of the first enclosing function or into global scope. Therefore,
function () {
if (true) {
var foo = 42
}
}
is the same as
function () {
var foo
if (true) {
foo = 42
}
}
– often an unwanted effect.
Statements on the expression level. Sometimes you need multiple statements to compute something, but want to do so inside an expression.
largeObject = {
...,
// sum from 1 to 42
sum: (function(n){
var sum = 0;
for(var i = 1; i <= n; i++)
sum += i;
return sum;
})(42),
...,
};
Perl has no need for self-invoking functions as a scoping mechanism, because a new scope is introduced by any curly brace. A bare block is always allowed on a statement level:
...
my $foo = 10;
{
my $foo = 42;
}
$foo == 10 or die; # lives
Perl has reduced need for self-invoking functions to introduce statements into an expression because of the do BLOCK builtin:
%large_hash = (
...,
sum => do {
my $sum = 0;
$sum += $_ for 1 .. 42;
$sum;
},
...,
);
However, you will sometimes want to short-curcuit in such a block. As return exits the surrounding subroutine (not block), it can be quite useful here. For example in a memoized function:
# moronic cached division by two
my %cache;
sub lookup {
my $key = shift;
return $cache{$key} //= sub {
for (1 .. 100) {
return $_ if $_ * 2 == $key
}
return;
}->();
}

Perl Inline C: Passing Arrayref to C Function

I can't get arrayrefs passed into a C function using Inline C. I would like some help, please.
First, just to prove I can get Inline C to work, I'll pass a scalar value to a C function:
#!/usr/bin/perl -I.
#
# try1.pl
#
use Inline C;
my $c = 3.8;
foo( $c );
__END__
__C__
void foo( double c )
{
printf( "C = %f\n", c );
}
And run it:
% ./try1.pl
C = 3.800000
Now do the same thing, but with an arrayref:
#!/usr/bin/perl -I.
#
# try2.pl
#
use Inline C;
my #abc = (1.9, 2.3, 3.8);
foo( \#abc );
__END__
__C__
void foo( double *abc )
{
printf( "C = %f\n", abc[2] );
}
Run it:
% ./try2.pl
Undefined subroutine &main::foo called at ./try1.pl line 7.
Any ideas what I'm doing wrong? Help greatly appreciated!
Inline::C is smart enough to extract values from SV's based on your C function's type signature. But if you want to pass complex Perl structures to C functions you'll need to use the Perl API to extract the values. So, here's what you need to know for this problem:
An array is an instance of a C struct called AV. A reference is implemented by a struct called an RV. All of these are "subtypes" (kinda) of a base struct called SV.
So to make this function work we need to do a few things.
Change the parameter type to SV * (pointer to an SV).
Use the API to check if this particular SV is a reference as opposed to some other kind of scalar
Check the RV to make sure it's pointing at an array and not something else.
Dereference the RV to get the SV that it points to.
Since we know that SV is an array, cast it to AV and start working with it.
Lookup the third element of that array, which is another SV.
Check that the SV we got from the array is a numerical value suitable for C printf
Extract the actual numerical out of the SV.
Print the message
So putting that all together, we get something like this:
use Inline C;
my #abc = (1.9, 2.3, 3.8);
foo( \#abc );
__END__
__C__
void foo( SV *abc )
{
AV *array; /* this will hold our actual array */
SV **value; /* this will hold the value we extract, note that it is a double pointer */
double num; /* the actual underlying number in the SV */
if ( !SvROK( abc ) ) croak( "param is not a reference" );
if ( SvTYPE( SvRV( abc ) ) != SVt_PVAV ) croak( "param is not an array reference" );
/* if we got this far, then we have an array ref */
/* now dereference it to get the AV */
array = (AV *)SvRV( abc );
/* look up the 3rd element, which is yet another SV */
value = av_fetch( array, 2, 0 );
if ( value == NULL ) croak( "Failed array lookup" );
if ( !SvNOK( *value ) ) croak( "Array element is not a number" );
/* extract the actual number from the SV */
num = SvNV( *value );
printf( "C = %f\n", num );
}
Kinda makes you appreciate how much work Perl does under-the-hood. :)
Now, you don't have to be as super-explicit as that example. You could get rid of some of the temp variables by doing things inline, e.g.
printf( "C = %f\n", SvNV( *value ) );
would eliminate the need to declare num. But I wanted to make it clear how much dereferencing and type-checking is needed to traverse a Perl structure in C.
And as #mob points out below, you don't actually have to do all that work (though it's a good idea to be familiar with how it works.)
Inline::C is smart enough that if you declare your function as
void foo( AV *abc ) {
...
}
It will automatically unwrap the AV for you and you can go straight to the av_fetch step.
If all of that seems baffling to you, I highly recommend taking a look at:
The Perlguts Illustrated PDF, then
The perlguts manpage, and then
The Inline::C Cookbook, while consulting
The perlapi manpage.
In your Inline::C code:
void foo( SV *reference ) {
AV *array;
array = (AV *)SvRV( reference );
...
}
Then deal with the array value as the AV type. See the Perl Inline::C Cookbook.
Wrong data type.
use Inline 'C';
my #abc = (1.9, 2.3, 3.8);
foo( \#abc );
__END__
__C__
void foo(SV* abc) {
sv_dump(abc);
}