Imagine I have a referece that points to an array that contains many anonimous arrays. Ex:
my #main_array = ( [1,2,3], [3,4,5], ['a','b','c'] );
my $reference = \#main_array
If later on I'm done using the data from that array and I only have a reference to it, what is the best method to delete that array and free the memory?
I usually do the following to free the memory used by data in a simple array:
undef #array
but because I only have a reference to it I thought about doing this
undef #{$reference}
If I do that, wouldn't I just be deleting the references to the anonymous arrays stored in the array (main_array) and not the actual content of the anonymous arrays?
I guess my question can be simplify as this: Does deleting a reference makes Perl free the memory used by the array, hash or scalar referred by the reference?
Thank you
Yes, undef #{$reference} (or undef #$reference) will do what undef #array did. It will free almost all memory used by the array to be reusable by the program.
But there is very rarely any good reason to do this. When your lexical $reference goes out of scope, the same thing will happen. Explicitly calling undef on it first will just make your code minutely slower.
If later on I'm done using the data from that array and I only have a reference to it, what is the best method to delete that array and free the memory?
Ideally, just let $reference go out of scope. Otherwise, you can use $reference = undef;.
Related
Is there any difference in doing the following, efficiency, bad practice...?
(In a context of bigger hashes and sending them through many functions)
sub function {
my ($self, $hash_ref) = #_;
my %hash = %{$hash_ref};
print $hash{$key};
return;
}
Compared to:
sub function {
my ($self, $hash_ref) = #_;
print $hash_ref->{$key};
return;
}
Let's say %$hash_ref contains N elements.
The first snippet does the following in addition to what the second snippet does:
Creates N scalars. (Can involve multiple memory allocations each.)
Adds N*2 scalars to the stack. (Cheap.)
Creates a hash. (More memory allocations...)
Adds N elements to a hash. (More memory allocations...)
The second snippet does the following in addition to what the first snippet does:
[Nothing. It's a complete subset of the first snippet]
The first snippet is therefore far less efficient than the second. It also more complicated by virtue of having extra code. The complete lack of benefit and the numerous costs dictate that one should avoid the pattern used in the first snippet.
1st snippet is silly. But it's convenient practice to emulate named arguments:
sub function {
my ($self, %params ) = #_;
...
}
So, pass arrays/hashes by reference, creation of new (especially big) hash will be much slower. But there is nothing bad in "named arguments" hack.
And did you now that there exist key/value slice (v5.20+ only)? You can copy part of hash easily this way:
my %foo = ( one => 1, two => 2, three => 3, four => 4);
my %bar = %foo{'one', 'four'};
More information in perldoc perldata
The first version of the sub creates a local copy of the data structure which reference is passed to it. As such it is far less efficient, of course.
There is one legitimate reason for this: to make sure the data in the caller isn't changed. That local %hash can be changed in the sub as needed or convenient and the data in the calling code is not affected. This way the data in the caller is also protected against accidental changes.
Another reason why a local copy of data is done, in particular with deeper data structures, is to avoid long chains of dereferencing and thus simplify code; so parts of deep hierarchies may be copied for simpler access. Then this is merely for (presumed) programming convenience.
So in the shown example there'd be absolutely no reason to make a local copy. However, presumably the question is about subs where more work is done and then what's best depends on details.
In Perl, if I create two references to an array element, the two pointers are equal.
my $ref1 = \$array[0];
my $ref2 = \$array[0];
print "$ref1\n$ref2";
The same applies to two references to a variable which stores a string, those pointers are equal.
If I create two variables in which I store equal strings, the references are not equal.
Why aren't they equal? The same data is stored in different places?
Does Perl refer to the variable instead of the memory location?
In Java, as you can see here, four equal strings refer to the same memory location.
Can somebody please explain that?
If you take several reference of one variable, all of them will point to the same memory location.
my $foo = 'foo';
my $ref1 = \$foo;
my $ref2 = \$foo;
say $ref1;
say $ref2;
The value behind $ref1 and $ref2 is the same, because they both point to the same variable.
SCALAR(0x171f7b0)
SCALAR(0x171f7b0)
If you assign the string (not the same variable containing a string) to two new variables and then take references for those, they will be different.
my $foo = 'foo';
my $bar = 'bar';
my $ref1 = \$foo;
my $ref2 = \$bar;
say $ref1;
say $ref2;
Here, $ref1 and $ref2 are not the same, because they are references to two different variables.
SCALAR(0x20947b0)
SCALAR(0x2094678)
The fact that both variables hold an equal value doesn't matter.
my $ref1 = \'foo';
my $ref2 = \'foo';
say $ref1;
say $ref2;
The same goes for if you directly take references to values without putting them into another variable first.
SCALAR(0x1322528)
SCALAR(0x12ee6f0)
Perl handles the memory internally, but it's not like it has a table with every possible string, and whenever you create a reference, it just uses that memory address.
In Java, strings are objects. Java knows about the objects that are available to it. When you define a string, it makes an object. That is not the case in Perl. Strings and numbers are just values, and they are put into memory when you use them.
What Perl does is keep track of its references. It will free up memory by removing unused values, but only if there are no more references to those values living somewhere else in the running program. That's called the ref count.
You can read about references in perlref. There also is a pretty good explanation of how all of this works in the book Object Oriented Perl by Damian Conway at Mannings.
$self->[UTF8] = $conf->{utf8};
Never seen such code before.
What does [] mean here?
In this case, the object $self is implemented as a blessed array reference rather than the far more common method of using a blessed hash reference. The syntax $foo->[42] accesses a single element from an array reference. Presumably, UTF8 is a constant that returns a numeric index into the array.
You see this idiom sometimes when people become convinced (usually incorrectly) that hash lookups on object attributes result in significant overhead and try to prematurely optimize their code.
The [] implies that $self is a reference to a list/array (assuming the code works). This looks a bit odd, though, as list indexes should be numeric.
I have an ActiveState PerlCtrl project. I'd like to know if it's possible to have a hash in the COM DLL, pass it's ref out to the calling process as a string (e.g. "HASH(0x2345)") and then pass that string back into the COM DLL and somehow bless it back into pointing to the relevant hash.
Getting the hashref seems easy enough, using return "" . \%Graph; and I have tried things like $Graph = shift; $Graph = bless {%$Graph}; but they don't seem to achieve what I'm after. The %Graph hash is at least global to the module.
The testing code (VBScript):
set o = CreateObject("Project.BOGLE.1")
x = o.new_graph()
wscript.echo x
x = o.add_vertex(x, "foo")
If these are different processes, you will need to either serialize the content of the hash or persistently store it in a disk file. To do the former, see Storable or Data::Dumper; for the latter, it depends whether it's a hash of simple scalars or something more complex.
If it is the same perl interpreter in the same process, you can keep some global variable like %main::hashes;
set $main::hashes{\%Graph} = \%Graph before passing the stringified reference back to the calling process, then later use it to look up the actual hash reference.
Don't do this, though: http://perlmonks.org/?node_id=379395.
No, you can't reliably pass hash references between processes.
I am mulling over a best practice for passing hash references for return data to/from functions.
On the one hand, it seems intuitive to pass only input values to a function and have only return output variables. However, passing hashes in Perl can only be done by reference, so it is a bit messy and would seem more of an opportunity to make a mistake.
The other way is to pass a reference in the input variables, but then it has to be dealt with in the function, and it may not be clear what is an input and what is a return variable.
What is a best practice regarding this?
Return references to an array and a hash, and then dereference it.
($ref_array,$ref_hash) = $this->getData('input');
#array = #{$ref_array};
%hash = %{$ref_hash};
Pass in references (#array, %hash) to the function that will hold the output data.
$this->getData('input', \#array, \%hash);
Just return the reference. There is no need to dereference the whole
hash like you are doing in your examples:
my $result = some_function_that_returns_a_hashref;
say "Foo is ", $result->{foo};
say $_, " => ", $result->{$_} for keys %$result;
etc.
I have never seen anyone pass in empty references to hold the result. This is Perl, not C.
Trying to create copies by saying
my %hash = %{$ref_hash};
is even more dangerous than using the hashref. This is because it only creates a shallow copy. This will lead you to thinking it is okay to modify the hash, but if it contains references they will modify the original data structure. I find it better to just pass references and be careful, but if you really want to make sure you have a copy of the reference passed in you can say:
use Storable qw/dclone/;
my %hash = %{dclone $ref_hash};
The first one is better:
my ($ref_array,$ref_hash) = $this->getData('input');
The reasons are:
in the second case, getData() needs to
check the data structures to make
sure they are empty
you have freedom to return undef as a special value
it looks more Perl-idiomatic.
Note: the lines
#array = #{$ref_array};
%hash = %{$ref_hash};
are questionable, since you shallow-copy the whole data structures here. You can use references everywhere where you need array/hash, using -> operator for convenience.
If it's getting complicated enough that both the callsite and the called function are paying for it (because you have to think/write more every time you use it), why not just use an object?
my $results = $this->getData('input');
$results->key_value_thingies;
$results->listy_thingies;
If making an object is "too complicated" then start using Moose so that it no longer is.
My personal preference for sub interfaces:
If the routine has 0-3 arguments, they may be passed in list form: foo( 'a', 12, [1,2,3] );
Otherwise pass a list of name value pairs. foo( one => 'a', two => 12, three => [1,2,3] );
If the routine has or may have more than one argument seriously consider using name/value pairs.
Passing in references increases the risk of inadvertent data modification.
On returns I generally prefer to return a list of results rather than an array or hash reference.
I return hash or array refs when it will make a noticeable improvement in speed or memory consumption (ie BIG structures), or when a complex data structure is involved.
Returning references when not needed deprives one of the ability to take advantage of Perl's nice list handling features and exposes one to the dangers of inadvertent modification of data.
In particular, I find it useful to assign a list of results into an array and return the array, which provides the contextual return behaviors of an array to my subs.
For the case of passing in two hashes I would do something like:
my $foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets number of items returned
my #foo = foo( hash1 => \%hash1, hash2 => \%hash2 ); # gets items returned
sub foo {
my %arg = #_;
# do stuff
return #results;
}
I originally posted this to another question, and then someone pointed to this as a "related post", so I'll post it here to for my take on the subject, assuming people will encounter it in the future.
I'm going to contradict the Accepted Answer and say that I prefer to have my data returned as a plain hash (well, as an even-sized list which is likely to be interpreted as a hash). I work in an environment where we tend to do things like the following code snippet, and it's much easier to combine and sort and slice and dice when you don't have to dereference every other line. (It's also nice to know that someone can't damage your hashref because you passed the entire thing by value -- though someone pointed out that if your hash contains more than simple scalars it's not so simple.)
my %filtered_config_slice =
hashgrep { $a !~ /^apparent_/ && defined $b } (
map { $_->build_config_slice(%some_params, some_other => 'param') }
($self->partial_config_strategies, $other_config_strategy)
);
This approximates something that my code might do: building a configuration for an object based on various configuration strategy objects (some of which the object knows about inherently, plus some extra guy) and then filters out some of them as irrelevant.
(Yes, we have nice tools like hashgrep and hashmap and lkeys that do useful things to hashes. $a and $b get set to the key and the value of each item in the list, respectively). (Yes, we have people who can program at this level. Hiring is obnoxious, but we have a quality product.)
If you don't intend to do anything resembling functional programming like this, or if you need more performance (have you profiled?) then sure, use hashrefs.
Uh... "passing hashes can only be done by reference"?
sub foo(%) {
my %hash = #_;
do_stuff_with(%hash);
}
my %hash = (a => 1, b => 2);
foo(%hash);
What am I missing?
I would say that if the issue is that you need to have multiple outputs from a function, it's better as a general practice to output a data structure, probably a hash, that holds everything you need to send out rather than taking modifiable references as arguments.