Tracking leak in a big async Perl process - perl

I am sorry in advance - but this post will not contain a code sample.
I was assigned with a task to debug a memory leak in some module.
In this program I have a management object that holds Data and other Objects. The program uses async methods that updates the managment object from time to time.
I used a Perl module Devel::Peek to dump the object, and I was curious about the reference count.
Since I am using a local variable to print this object - the parent refcount is always 1 as expected.
My 2nd Level - the real management object refcount is always bigger then 1.
All other levels are also always 1 as expected.
Here is an example:
SV = RV(0xbb3e244) at 0xbb3e238
REFCNT = 1
FLAGS = (PADMY,ROK)
RV = 0xcf19478
SV = PVHV(0xd0e1f98) at 0xcf19478
REFCNT = 6
FLAGS = (PADMY,OBJECT,OOK,SHAREKEYS)
STASH = 0x9b116a0 "<XXXXX>"
ARRAY = 0xd0ff190 (0:106, 1:104, 2:34, 3:10, 4:2)
hash quality = 105.4%
KEYS = 210
FILL = 150
MAX = 255
RITER = -1
EITER = 0x0
Elt "<XXXXX>" HASH = 0x10b5af01
SV = PVIV(0xce05510) at 0xcf07ba8
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 16200
PV = 0xd0fc0d8 "16200"\0
CUR = 5
LEN = 8
Elt "<XXXXX>" HASH = 0x3ebbb602
SV = PV(0xd10c810) at 0xcfb4350
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0xd2008d8 "<XXX>"\0
CUR = 4
LEN = 8
Elt "<XXXXX>" HASH = 0x1c7c0002
SV = RV(0xcf197f4) at 0xcf197e8
REFCNT = 1
FLAGS = (ROK)
RV = 0xd456ba0
SV = PVHV(0xd66a11c) at 0xd456ba0
REFCNT = 1
FLAGS = (PADMY,OOK,SHAREKEYS)
ARRAY = 0xd19a8d8 (0:3, 1:3, 2:2)
hash quality = 111.4%
KEYS = 7
FILL = 5
MAX = 7
RITER = -1
EITER = 0x0
Elt "<XXXXX>" HASH = 0x2d2f24a1
SV = RV(0xc2e3fcc) at 0xc2e3fc0
REFCNT = 1
FLAGS = (ROK)
RV = 0xd550548
I want to understand the reference count process.
If I understand the management object Ptr is being accessed from several locations. The internal objects are being accessed only once from the management object.
Is it possible that if I update internal fields on the management object from several locations it will cause a memory leak?

A typical problem within async (event driven) programs is that objects are often referenced from within callbacks which are attached to some event loop and that one has to be really careful to clean everything up on error. Strategic uses of weaken from Scalar::Util helps here a lot.
But once you have the mess it is really hard to debug. I usually use my own module Devel::TrackObjects to track down objects which do not get destroyed as expected and consider it easier to use for this purpose than Devel::Peek. But Devel::TrackObjects it can only deal with objects and does not help with other kinds of circular references.

Well, it's hard to answer directly, without any sort of idea what you're actually doing.
But yes - perl uses reference counting to determine if memory is still 'in use'. It's perfectly possible to cause a circular reference, and thus that memory will never be eligible to 'free' and thus it will leak.
The way you can avoid this is via the Scalar::Util module, and the weaken function call - that allows a reference to exist, but not 'count' for reference counting.

Related

Perl variable assignment side effects

I'll be the first to admit that Perl is not my strong suit. But today I ran across this bit of code:
my $scaledWidth = int($width1x * $scalingFactor);
my $scaledHeight = int($height1x * $scalingFactor);
my $scaledSrc = $Media->prependStyleCodes($src, 'SX' . $scaledWidth);
# String concatenation makes this variable into a
# string, so we need to make it an integer again.
$scaledWidth = 0 + $scaledWidth;
I could be missing something obvious here, but I don't see anything in that code that could make $scaledWidth turn into a string. Unless somehow the concatenation in the third line causes Perl to permanently change the type of $scaledWidth. That seems ... wonky.
I searched a bit for "perl assignment side effects" and similar terms, and didn't come up with anything.
Can any of you Perl gurus tell me if that commented line of code actually does anything useful? Does using an integer variable in a concatenation expression really change the type of that variable?
It is only a little bit useful.
Perl can store a scalar value as a number or a string or both, depending on what it needs.
use Devel::Peek;
Dump($x = 42);
Dump($x = "42");
Outputs:
SV = PVIV(0x139a808) at 0x178a0b8
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 42
PV = 0x178d9e0 "0"\0
CUR = 1
LEN = 16
SV = PVIV(0x139a808) at 0x178a0b8
REFCNT = 1
FLAGS = (POK,pPOK)
IV = 42
PV = 0x178d9e0 "42"\0
CUR = 2
LEN = 16
The IV and IOK tokens refer to how the value is stored as a number and whether the current integer representation is valid, while PV and POK indicate the string representation and whether it is valid. Using a numeric scalar in a string context can change the internal representation.
use Devel::Peek;
$x = 42;
Dump($x);
$y = "X" . $x;
Dump($x);
SV = IV(0x17969d0) at 0x17969e0
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 42
SV = PVIV(0x139aaa8) at 0x17969e0
REFCNT = 1
FLAGS = (IOK,POK,pIOK,pPOK)
IV = 42
PV = 0x162fc00 "42"\0
CUR = 2
LEN = 16
Perl will seamlessly convert one to the other as needed, and there is rarely a need for the Perl programmer to worry about the internal representation.
I say rarely because there are some known situations where the internal representation matters.
Perl variables are not typed. Any scalar can be either a number or a string depending how you use it. There are a few exceptions where an operation is dependent on whether a value seems more like a number or string, but most of them have been either deprecated or considered bad ideas. The big exception is when these values must be serialized to a format that explicitly stores numbers and strings differently (commonly JSON), so you need to know which it is "supposed" to be.
The internal details are that a SV (scalar value) contains any of the values that have been relevant to its usage during its lifetime. So your $scaledWidth first contains only an IV (integer value) as the result of the int function. When it is concatenated, that uses it as a string, so it generates a PV (pointer value, used for strings). That variable contains both, it is not one type or the other. So when something like JSON encoders need to determine whether it's supposed to be a number or a string, they see both in the internal state.
There have been three strategies that JSON encoders have taken to resolve this situation. Originally, JSON::PP and JSON::XS would simply consider it a string if it contains a PV, or in other words, if it's ever been used as a string; and as a number if it only has an IV or NV (double). As you alluded to, this leads to an inordinate amount of false positives.
Cpanel::JSON::XS, a fork of JSON::XS that fixes a large number of issues, along with more recent versions of JSON::PP, use a different heuristic. Essentially, a value will still be considered a number if it has a PV but the PV matches the IV or NV it contains. This, of course, still results in false positives (example: you have the string '5', and use it in a numerical operation), but in practice it is much more often what you want.
The third strategy is the most useful if you need to be sure what types you have: be explicit. You can do this by reassigning every value to explicitly be a number or string as in the code you found. This assigns a new SV to $scaledWidth that contains only an IV (the result of the addition operation), so there is no ambiguity. Another method of being explicit is using an encoding method that allows specifying the types you want, like Cpanel::JSON::XS::Type.
The details of course vary if you're not talking about the JSON format, but that is where this issue has been most deliberated. This distinction is invisible in most Perl code where the operation, not the values, determine the type.

What does flag `pIOK` mean?

When dumping perl SV with Devel::Peek I can see:
SV = IV(0x1c13168) at 0x1c13178
REFCNT = 1
FLAGS = (IOK,pIOK)
IV = 2
But can not find the description what pIOK mean.
I tried to look it at Devel::Peek, perlapi , perlguts, perlxs ...
In sources I found that:
{SVp_IOK, "pIOK,"}
But still can not find what SVp_IOK is. What is it?
UPD
I found this document. It shed the light a bit what flags mean and where they are situated. (beware this DOC is outdated a bit)
This flag indicates that the object has a valid non-public IVX field value. It can only be set for value type SvIV or subtypes of it.
UPD
Why private and public flags are differ
pIOK is how Devel::Peek represents the bit corresponding to bit mask SVp_IOK. The p indicates a "private" flag, and it forms a pair with "public" flag IOK (bit mask SVf_IOK)
The exact meaning of the private flags has changed across perl versions, but in general terms they mean that the IV (or NV or PV) field of the SV is "inaccurate" in some way
The most common situation where pIOK is set on its own (pIOK is always set if IOK is set) is where a PV has been converted to a numeric NV value. The NV and IV fields are both populated, but if the IV value isn't an accurate representation of the number (i.e. it has been truncated) then pIOK is set but IOK is cleared
This code shows a way to reach that state. Variable $pi_str is set to a string value for π and it is converted to a floating-point value by adding 0.0 and storing it into $pi_num. Devel::Peek now shows that NOK/pNOK and POK/pPOK are set, but only pIOK while IOK remains clear. Looking at the IV value we can see why: it is set to 3, which is the cached value of int $pi_str in case we need it again, but it is not an accurate representation of the string "3.14159" in integer form
use strict;
use warnings 'all';
use Devel::Peek 'Dump';
my $pi_str = "3.14159";
my $pi_num = $pi_str + 0.0;
Dump $pi_str;
output
SV = PVNV(0x28fba68) at 0x3f30d30
REFCNT = 1
FLAGS = (NOK,POK,IsCOW,pIOK,pNOK,pPOK)
IV = 3
NV = 3.14159
PV = 0x3fb7ab8 "3.14159"\0
CUR = 7
LEN = 10
COW_REFCNT = 1
Perl v5.16 and before used to use the flag to indicate "magic" variables (such as tied values) because the value in the IV field could not be used directly. That was changed in v5.18 and later, and magic values now use pIOK in the same way as any other value

MATLAB: How to dynamically access variables

I declared some variables comprising of simple row vectors which represents input parameters for another function. Within a loop these variables should be used and the result will be assigned to a structure.
Now, my question is how to best access the content of the predefined variables. I found a solution using eval. However, I often read that the usage of eval should be avoided. Apparently it's not best practice. So, what's best practice for my problem?
varABC = [1,2,3];
varDEF = [4,5,6];
varGHI = [7,8,9];
names = {'ABC','DEF','GHI'};
result = {'result1','result2','result3'};
for i = 1 : 3
varString = strcat('var',names{i});
test.(result{i}) = sum(eval(varString));
end
I would suggest rewriting your code a little bit
names = {'ABC','DEF','GHI'};
result = {'result1','result2','result3'};
option 1
% Use struct instead of a simple variable
var.ABC = [1,2,3];
var.DEF = [4,5,6];
var.GHI = [7,8,9];
for i = 1 : 3
test.(result{i}) = sum(var.(names{i}));
end
option 2
% Use dictionary
c = containers.Map;
c('ABC') = [1,2,3];
c('DEF') = [4,5,6];
c('GHI') = [7,8,9];
for i = 1 : 3
test.(result{i}) = sum(c(names{i}));
end

Matlab coder & dynamic field references

I'm trying to conjure up a little parser that reads a .txt file containing parameters for an algorithm so i don't have to recompile it everytime i change a parameter. The application is C code generated from .m via coder, which unfortunately prohibits me from using a lot of handy matlab gimmicks.
Here's my code so far:
% read textfile
string = readfile(filepath);
% do fancy rearranging
linebreaks = zeros(size(string));
equals = zeros(size(string));
% find delimiters
for n=1:size(string,2)
if strcmp(string(n),char(10))
linebreaks(n) = 1;
elseif strcmp(string(n), '=')
equals(n) = 1;
end
end
% write first key-value pair
idx_s = find(linebreaks);idx_s = [idx_s length(string)];
idx_e = find(equals);
key = string(1:idx_e(1)-1);
value = str2double(string(idx_e(1)+1:idx_s(1)-1));
parameters.(key) = value;
% find number of parameters
count = length(idx_s);
% write remaining key-value pairs
for n=2:count
key = string(idx_s(n-1)+1:idx_e(n)-1);
value = str2double(string(idx_e(n)+1:idx_s(n)-1));
parameters.(key) = value;
end
The problem is that seemingly coder does not support dynamic fieldnames for structures like parameters.(key) = value.
I'm a bit at a loss as to how else i am supposed to come up with a parameter struct that holds all my key-value pairs without hardcoding it. It would somewhat (though not completely) defeat the purpose if the names of keys were not dynamically linked to the parameter file (more manual work if parameters get added/deleted, etc.). If anybody has an idea how to work around this, i'd be very grateful.
As you say, dynamic fieldnames for structures aren't allowed in MATLAB code to be used by Coder. I've faced situations much like yours before, and here's how I handled it.
First, we can list some nice tools that are allowed in Coder. We're allowed to have classes (value or handle), which can be quite handy. Also, we're allowed to have variable sized data if we use coder.varsize to specifically designate it. We also can use string values in switch statements if we like. However, we cannot use coder.varsize for properties in a class, but you can have varsized persistent variables if you like.
What I'd do in your case is create a handle class for storing and retrieving the values. The following example is pretty basic, but will work and could be expanded. If a persistent variable were used in a method, you could even create a varsized allocated storage for the data, but in my example, it's a property and has been limited in the number of values it can store.
classdef keyval < handle %# codegen
%KEYVAL A key and value class designed for Coder
% Stores an arbitrary number of keys and values.
properties (SetAccess = private)
numvals = 0
end
properties (Access = private)
intdata
end
properties (Constant)
maxvals = 100;
maxkeylength = 30;
end
methods
function obj = keyval
%KEYVAL Constructor for keyval class
obj.intdata = repmat(struct('key', char(zeros(1, obj.maxkeylength)), 'val', 0), 1, obj.maxvals);
end
function result = put(obj, key, value)
%PUT Adds a key and value pair into storage
% Result is 0 if successful, 1 on error
result = 0;
if obj.numvals >= obj.maxvals
result = 1;
return;
end
obj.numvals = obj.numvals + 1;
tempstr = char(zeros(1,obj.maxkeylength));
tempstr(1,1:min(end,numel(key))) = key(1:min(end, obj.maxkeylength));
obj.intdata(obj.numvals).key = tempstr;
obj.intdata(obj.numvals).value = value;
end
function keystring = getkeyatindex(obj, index)
%GETKEYATINDEX Get a key name at an index
keystring = deblank(obj.intdata(index).key);
end
function value = getvalueforkey(obj, keyname)
%GETVALUEFORKEY Gets a value associated with a key.
% Returns NaN if not found
value = NaN;
for i=1:obj.numvals
if strcmpi(keyname, deblank(obj.intdata(i).key))
value = obj.intdata(i).value;
end
end
end
end
end
This class implements a simple key/value addition as well as lookup. There are a few things to note about it. First, it's very careful in the assignments to make sure we don't overrun the overall storage. Second, it uses deblank to clear out the trailing zeros that are necessary in the string storage. In this situation, it's not permitted for the strings in the structure to be of different length, so when we put a key string in there, it needs to be exactly the same length with trailing nulls. Deblank cleans this up for the calling function.
The constant properties allocate the total amount of space we're allowed in the storage array. These can be increased, obviously, but not at runtime.
At the MATLAB command prompt, using this class looks like:
>> obj = keyval
obj =
keyval with properties:
numvals: 0
>> obj.put('SomeKeyName', 1.23456)
ans =
0
>> obj
obj =
keyval with properties:
numvals: 1
>> obj.put('AnotherKeyName', 34567)
ans =
0
>> obj
obj =
keyval with properties:
numvals: 2
>> obj.getvalueforkey('SomeKeyName')
ans =
1.2346
>> obj.getkeyatindex(2)
ans =
AnotherKeyName
>> obj.getvalueforkey(obj.getkeyatindex(2))
ans =
34567
If a totally variable storage area is desired, the use of persistent variables with coder.varsize would work, but that will limit the use of this class to a single instance. Persistent variables are nice, but you only get one of them ever. As written, you can use this class in many different places in your program for different storage. If you use a persistent variable, you may only use it once.
If you know some of the key names and are later using them to determine functionality, remember that you can switch on strings in MATLAB, and this works in Coder.

What do Perl functions that return Boolean actually return

The Perl defined function (and many others) returns "a Boolean value".
Given Perl doesn't actually have a Boolean type (and uses values like 1 for true, and 0 or undef for false) does the Perl language specify exactly what is returned for a Boolean values? For example, would defined(undef) return 0 or undef, and is it subject to change?
In almost all cases (i.e. unless there's a reason to do otherwise), Perl returns one of two statically allocated scalars: &PL_sv_yes (for true) and &PL_sv_no (for false). This is them in detail:
>perl -MDevel::Peek -e"Dump 1==1"
SV = PVNV(0x749be4) at 0x3180b8
REFCNT = 2147483644
FLAGS = (PADTMP,IOK,NOK,POK,READONLY,pIOK,pNOK,pPOK)
IV = 1
NV = 1
PV = 0x742dfc "1"\0
CUR = 1
LEN = 12
>perl -MDevel::Peek -e"Dump 1==0"
SV = PVNV(0x7e9bcc) at 0x4980a8
REFCNT = 2147483647
FLAGS = (PADTMP,IOK,NOK,POK,READONLY,pIOK,pNOK,pPOK)
IV = 0
NV = 0
PV = 0x7e3f0c ""\0
CUR = 0
LEN = 12
yes is a triple var (IOK, NOK and POK). It contains a signed integer (IV) equal to 1, a floating point number (NV) equal to 1, and a string (PV) equal to 1.
no is also a triple var (IOK, NOK and POK). It contains a signed integer (IV) equal to 0, a floating point number (NV) equal to 0, and an empty string (PV). This means it stringifies to the empty string, and it numifies to 0. It is neither equivalent to an empty string
>perl -wE"say 0+(1==0);"
0
>perl -wE"say 0+'';"
Argument "" isn't numeric in addition (+) at -e line 1.
0
nor to 0
>perl -wE"say ''.(1==0);"
>perl -wE"say ''.0;"
0
There's no guarantee that this will always remain the case. And there's no reason to rely on this. If you need specific values, you can use something like
my $formatted = $result ? '1' : '0';
They return a special false value that is "" in string context but 0 in numeric context (without a non-numeric warning). The true value isn't so special, since it's 1 in either context. defined() does not return undef.
(You can create similar values yourself with e.g. Scalar::Util::dualvar(0,"").)
Since that's the official man page I'd say that its exact return value is not specified. If the Perl documentation talks about a Boolean value then then it almost always talks about evaluating said value in a Boolean context: if (defined ...) or print while <> etc. In such contexts several values evaluate to a false: 0, undef, "" (empty strings), even strings equalling "0".
All other values evaluate to true in a Boolean context, including the infamous example "0 but true".
As the documentation is that vague I would not ever rely on defined() returning any specific value for the undefined case. However, you'll always be OK if you simply use defined() in a Boolean context without comparing it to a specific value.
OK: print "yes\n" if defined($var)
Not portable/future proof: print "yes\n" if defined($var) eq '' or something similar
It probably won't ever change, but perl does not specify the exact boolean value that defined(...) returns.
When using Boolean values good code should not depend on the actual value used for true and false.
Example:
# not so great code:
my $bool = 0; #
...
if (some condition) {
$bool = 1;
}
if ($bool == 1) { ... }
# better code:
my $bool; # default value is undef which is false
$bool = some condition;
if ($bool) { ... }
99.9% of the time there is no reason to care about the value used for the boolean.
That said, there are some cases when it is better to use an explicit 0 or 1 instead of the boolean-ness of a value. Example:
sub foo {
my $object = shift;
...
my $bool = $object;
...
return $bool;
}
the intent being that foo() is called with either a reference or undef and should return false if $object is not defined. The problem is that if $object is defined foo() will return the object itself and thus create another reference to the object, and this may interfere with its garbage collection. So here it would be better to use an explicit boolean value here, i.e.:
my $bool = $object ? 1 : 0;
So be careful about using a reference itself to represent its truthiness (i.e. its defined-ness) because of the potential for creating unwanted references to the reference.