Copy on Write for References - perl

Perl currently supports Copy on Write (CoW) for scalar variables however it doesn't appear to have anything for hashrefs and arrayrefs.
Perl does, however, have subroutines to modify variable internals like weaken so I'm guessing that there might exist a solution.
I have a situation where I have a large structure I'm returning from a package which keeps an internal state of this large structure. I want to ensure that if either the returned references or the internal reference (which are both currently the same reference) is modified that I end up with a Copy-on-write situation where the data the references are pointing to is copied, modified and the reference used to modified the data is updated to point to the new data.
package SomePackage;
use Moose;
has some_large_internal_variable_ref => (
'is' => 'rw',
'isa' => 'HashRef',
);
sub some_operation {
my ($self) = #_;
$self->some_large_internal_variable_ref({
# create some large result that is different every time
});
}
sub get_result {
my ($self) = #_;
return $self->some_large_internal_variable_ref;
}
1;
use strict;
use warnings;
use SomePackage;
use Test::More;
# Situtation 1 where the internally stored reference is modified
# This will pass!
my $package = SomePackage->new();
$package->some_operation();
my $result1 = $package->get_result();
$package->some_operation();
my $result2 = $package->get_result();
isnt($result1, $result2, "These two references should no longer be the same");
# Situtation 2 where the externally stored references is modified
# This will fail
$package = SomePackage->new();
$package->some_operation();
$result1 = $package->get_result();
$result1->{foo} = "bar";
$result2 = $package->get_result();
isnt($result1, $result2, "These two references should no longer be the same");
done_testing;
I'm trying to avoid a situation where I have to clone the values on the get_result return as this would result in a situation where memory usage is doubled.
I'm hoping there is some form of weaken I can call on the variable to indicate that, should a modification be made to behave with Copy on Write behaviour.

Related

Why is the Hashref passed to the Net::Ping constructor, set to an empty hashref after Net::Ping->new($args)?

What am I missing here?
When passing arguments to Net::Ping like this, then $args and $args_copy will both be set to an empty hashref after initializing the constructor Net::Ping->new($args).
use strict;
use warnings;
use Data::Dumper qw(Dumper);
use Net::Ping;
sub _ping {
my ($args) = #_;
my $p = Net::Ping->new($args);
$p->close();
}
my $args = { proto => 'udp' };
my $args_copy = $args;
print Dumper $args; # $VAR1 = { 'proto' => 'udp' }
print Dumper $args_copy; # $VAR1 = { 'proto' => 'udp' }
_ping($args);
print Dumper $args; # $VAR1 = {}
print Dumper $args_copy; # $VAR1 = {}
I see the same behavior on both Strawberry Perl and WSL2 running Ubuntu 20.04.4 LTS with Perl v5.30.0.
This is interesting, a class (constructor) deleting caller's data.
The shown code passes a reference to the Net::Ping constructor and that data gets cleared, and right in the constructor (see below).
To avoid having $args cleared, if that is a problem, pass its copy instead
_ping( { %$args } );
This first de-references the hash and then constructs an anonymous hash reference with it,† and passes that. So $args is safe.
The constructor new uses data from #_ directly (without making local copies), and as it then goes through the keys it also deletes them, I presume for convenience in further processing. (I find that scary, I admit.)
Since a reference is passed to new the data in the calling code can get changed.‡
† When copying a hash (or array) with a complex data structure in it -- when its values themselves contain references -- we need to make a deep copy. One way is to use Storable for it
use Storable qw(dclone);
my $deep_copy = dclone $complex_data_structure;
Here that would mean _ping( dclone $args );. It seems that new can only take a reference to a flat hash (or scalars) so this wouldn't be necessary.
‡ When a sub works directly with the references it gets then it can change data in the caller
sub sub_takes_ref {
my ($ref_data) = #_;
for my $k (keys %$ref_data) {
$ref_data->{$k} = ...; # !!! data in caller changed !!!
}
}
...
my $data = { ... }; # a hashref
sub_takes_ref( $data );
However, if a local copy of arguments is made in the sub then caller's data cannot be changed
use Storable qw(dclone); # for deep copy below
sub sub_takes_ref {
my ($ref_data) = #_;
my $local_copy_of_data = dclone $ref_data;
for my $k (keys %$local_copy_of_data) {
$local_copy_of_data->{$k} = ...; # data in caller safe
}
}
(Just remember to not touch $ref_data but to use the local copy.)
This way of changing data in the caller is of course useful when the sub is meant to work on data structures with large amounts of data, since this way they don't have to be copied. But when that is not the purpose of the sub then we need to be careful, or just make a local copy to be safe.

Completely destroy all traces of an object in Perl

I am writing a Perl script that gets data from a changing outside source. As data is added or removed from this outside source I would like the data structures in my Perl script to mirror the changes. I was able to do this by creating objects that store the data in a similar fashion and synchronizing the data in those objects whenever I try to access the data in my Perl script.
This works great, and gets any new data that is added to this external source; however a problem arises when any data is removed. If any data is removed I want all existing references to it in my Perl script to be destroyed as well so the user can no longer attempt to access the data without raising an error.
The only way I could think of was to undefine the internal reference to the data as soon as it is determined that the data no longer exists. It appears, however; that when a reference is undefined it doesn't remove the data stored in the location of the reference, it only deletes the data from the variable that holds the reference.
Here is a test script demonstrating my problem:
#!/usr/bin/perl
# Always use these
use strict;
use warnings;
####################################################################################################
# Package to create an object
package Object;
use Moose;
# Define attributes
has 'name' => (is => 'ro', isa => 'Str', required => 1);
####################################################################################################
# Package to test with
package test;
# Create an object
my $object1 = Object->new('name' => 'Test Object');
print 'OBJECT1 NAME: '.$object1->name()."\n";
# Create another reference to the object
my $object2 = $object1;
# Print the name
print 'OBJECT2 NAME: '.$object2->name()."\n";
# Print both references
print "\n";
print "OBJ : $object1\n";
print "OBJ2 : $object2\n";
print "\n";
# Undefine the reference to object2
undef $object2;
# Try to print both names
print 'OBJECT1 NAME: '.$object1->name()."\n";
print 'OBJECT2 NAME: '.$object2->name()."\n";
Output:
How can I completely destroy all traces of an object so that any attempt to access it's data will result in an error?
EDIT:
Here is a different example that may explain better what I am trying to achieve.
Say I have a file object:
my $file = File->new();
Now I want to get the text in that file
my $text = $file->text();
Now I get it again (for some unknown reason)
my $text2 = $file->text();
I want to be able to modify $text and have it directly effect the contents of $text2 as well as change the actual text attribute of the file.
I'm basically trying to tie the variables together so if one changes they all change. Also if one is deleted they would all be deleted.
This would also mean if the text attribute is changed, $text1 and $text2 would also change with it to reflect the new value.
Could this be done using an alias of some sort?
Perl uses reference counting to "retire" data.
What your program is doing is as follows:
Create an object and assign a reference to $object1
Copy that reference to $object2 and increment the object's reference count
Change the value of $object2 and decrement the object's reference count.
In the end you still have a hold of the object via $object1 and it's reference count will not drop to below 1 while you keep a hold of it. Technically as your program shuts down the $object1 variable will be destroyed as it goes out of scope, at which the object's reference count will drop to 0 and perl will look for and try to call it's DESTROY method.
If you truly want to "see" an item destroyed you may want to look into defining a DESTROY method that prints out a message upon object destruction. This way you can "let go" of your last reference and still see it's destruction.
You can't free an object that's still being used. Have the object itself keep track of whether it's still valid or not.
sub new {
my ($class, $data) = #_;
my $self = bless({}, $class);
$self->{valid} = 1;
$self->{data} = $data;
return $self;
}
sub delete {
my $self = shift;
undef(%$self);
}
sub data {
my $self = shift;
croak("Invalid object") if !$self->{valid};
$self->{data} = $_[0] if #_;
return $self->{data};
}
Example:
my $o1 = Class->new('big_complex_data');
my $o2 = $o1;
say $o1->data();
say $o2->data();
$o1->delete();
say $o2->name();
Output:
big_complex_data
big_complex_data
Invalid object at a.pl line 39.

Replacing a class in Perl ("overriding"/"extending" a class with same name)?

I am trying to Iterate directories in Perl, getting introspectable objects as result, mostly so I can print fields like mtime when I'm using Dumper on the returns from IO::All.
I have discovered, that it can be done, if in the module IO::All::File (for me, /usr/local/share/perl/5.10.1/IO/All/File.pm), I add the line field mtimef => undef;, and then modify its sub file so it runs $self->mtimef($self->mtime); (note, this field cannot have the same name (mtime) as the corresponding method/property, as those are dynamically assigned in IO::All). So, in essence, I'm not interested in "overloading", as in having the same name for multiple function signatures - I'd want to "replace" or "override" a class with its extended version (not sure how this is properly called), but under the same name; so all other classes that may use it, get on to using the extended version from that point on.
The best approach for me now would be, if I could somehow "replace" the IO::All::File class, from my actual "runnable" Perl script -- if somehow possible, by using the mechanisms for inheritance, so I can just add what is "extra". To show what I mean, here is an example:
use warnings;
use strict;
use Data::Dumper;
my #targetDirsToScan = ("./");
use IO::All -utf8 ; # Turn on utf8 for all io
# try to "replace" the IO::All::File class
{ # recursive inheritance!
package IO::All::File;
use IO::All::File -base;
# hacks work if directly in /usr/local/share/perl/5.10.1/IO/All/File.pm
field mtimef => undef; # hack
sub file {
my $self = shift;
bless $self, __PACKAGE__;
$self->name(shift) if #_;
$self->mtimef($self->mtime); # hack
return $self->_init;
}
1;
}
# main script start
my $io = io(#targetDirsToScan);
my #contents = $io->all(0); # Get all contents of dir
for my $contentry ( #contents ) {
print Dumper \%{*$contentry};
}
... which fails with "Recursive inheritance detected in package 'IO::All::Filesys' at /usr/local/share/perl/5.10.1/IO/All/Base.pm line 13."; if you comment out the "recursive inheritance" section, it all works.
I'm sort of clear on why this happens with this kind of syntax - however, is there a syntax, or a way, that can be used to "replace" a class with its extended version but of the same name, similar to how I've tried it above? Obviously, I want the same name, so that I wouldn't have to change anything in IO::All (or any other files in the package). Also, I would preferably do this in the "runner" Perl script (so that I can have everything in a single script file, and I don't have to maintain multiple files) - but if the only way possible is to have a separate .pm file, I'd like to know about it as well.
So, is there a technique I could use for something like this?
Well, I honestly have no idea what is going on, but I poked around with the code above, and it seems all that is required, is to remove the -base from the use IO::All::File statement; and the code otherwise seems to work as I expect it - that is, the package does get "overriden" - if you change this snippet in the code above:
# ...
{ # no more recursive inheritance!? IO::All::File gets overriden with this?!
package IO::All::File;
use IO::All::File; # -base; # just do not use `-base` here?!
# hacks work if directly in /usr/local/share/perl/5.10.1/IO/All/File.pm
field mtimef => undef; # hack
sub file {
my $self = shift;
bless $self, __PACKAGE__;
$self->name(shift) if #_;
$self->mtimef($self->mtime); # hack
print("!! *haxx0rz'd* file() reporting in\n");
return $self->_init;
}
1;
}
# ...
I found this so unbelievable, I even added the print() there to make sure it is the "overriden" function that runs, and sure enough, it is; this is what I get in output:
...
!! *haxx0rz'd* file() reporting in
$VAR1 = {
'_utf8' => 1,
'mtimef' => 1394828707,
'constructor' => sub { "DUMMY" },
'is_open' => 0,
'io_handle' => undef,
'name' => './test.blg',
'_encoding' => 'utf8',
'package' => 'IO::All'
};
...
... and sure enough,the field is there, as expected, too...
Well - I hope someone eventually puts a more qualified answer here; for the time being, I hope this is as good as a fix to my problems :) ...

Multiple data members in a perl class

I am new to perl and still learning oop in perl. I usually code in C, C++. It is required to bless an object to notify perl to search for methods in that package first. That's what bless does. And then every function call made with help of -> passes the instance itself as first parameter. Now I have a doubt in writing the constructor for a new object. Normally a constructor would normally look like:
sub new {
my %hash = {};
return bless {%hash}; #will automatically take this package as the class
}
Now I want to have two data members in my class so I can do something like this:
sub new {
my %hash = {};
$hash->{"table_header"} = shift #_; #add element to hash
$hash->{"body_content"} = shift #_;
return bless {%hash}; #will automatically take this package as the class
}
My question is that is this the only possible way. Can't we have multiple data members like in C and C++ and we do have to use strings like "table_header" and "body_content".
EDIT:
In C or C++ we can directly reference the data member(assume its public for now). Here there is one extra reference which has to be made. I wanted to know if there is any way we can have a C like object.
sub new {
my $table_header = shift #_;
my $body_content = shift #_;
#bless somehow
}
Hope this clears some confusion.
There are modules that make OOP in Perl easier. The most important is Moose:
use strict; use warnings;
package SomeObject;
use Moose; # this is now a Moose class
# declare some members. Note that everything is "public"
has table_header => (
is => 'ro', # read-only access
);
has body_content => (
is => 'rw', # read-write access
);
# a "new" method is autogenerated
# some method that uses these fields.
# Note that the members can only be accessed via methods.
# This guards against typos that can't be easily caught with hashes.
sub display {
my ($self) = #_;
my $underline = "=" x (length $self->table_header);
return $self->table_header . "\n" . $underline . "\n\n" . $self->body_content . "\n";
}
package main;
# the "new" takes keyword arguments
my $instance = SomeObject->new(
table_header => "This is a header",
body_content => "Some body content",
);
$instance->body_content("Different content"); # set a member
print $instance->display;
# This is a header
# ================
#
# Different content
If you get to know Moose, you will find an object system that is far more flexible than that in Java or C++, as it takes ideas from Perl6 and the Common Lisp Object System. Of course, this is fairly ugly, but it works well in practice.
Because of the way Perl OOP works, it isn't possible to have the instance members accessible as variables on their own. Well, almost. There is the experimental mop module which does exactly that.
use strict; use warnings;
use mop;
class SomeObject {
# Instance variables start with $!..., and behave like ordinary variables
# If you make them externally accessible with "is ro" or "is rw", then
# appropriate accessor methods are additionally generated.
# a private member with public read-only accessor,
# which has to be initialized in the constructor.
has $!table_header is ro = die 'Please specify a "table_header"!';
# a private member with public read-write accessor,
# which is optional.
has $!body_content is rw = "";
# new is autogenerated, as in Moose
method display() {
# arguments are handled automatically, so we could also do $self->table_header.
my $underline = "=" x (length $!table_header);
return "$!table_header\n$underline\n\n$!body_content\n";
}
}
# as seen in Moose
my $instance = SomeObject->new(
table_header => "This is a header",
body_content => "Some body content",
);
$instance->body_content("Different content"); # set a member, as in Moose
print $instance->display;
# This is a header
# ================
#
# Different content
Although it has pretty syntax, don't use mop right now for serious projects and stick to Moose instead. If Moose is too heavyweight for you, then you might enjoy lighter alternatives like Mouse or Moo (these three object systems are mostly compatible with each other).
You are getting confused between hashes and hash references. You are also forgetting that the first parameter to any method is the object reference or the name of the package. Perl constructors are inherited like any other method, so you must bless the new object into the correct package for polymorphism to work properly. This code is what you intended
sub new {
my $package = shift;
my %self;
$self{table_header} = shift;
$self{body_content} = shift;
bless \%self, $package;
}
I am not clear what you mean by “directly reference the data member”, but if you hoped that you could avoid writing $self everywhere so that every variable was implicitly an element of the hash then you cannot. Perl is far more flexible than most languages, and can use any blessed reference as an object instance. It is most common to use a hash, but occasionally a reference to an array, a scalar, or even a file handle is more appropriate. The cost of this flexibility is specifying exactly when you are referring to a member of the blessed hash. I don't see that it's too great a burden.
You can always write your code more concisely. The method above can be written
sub new {
my $package = shift;
my %self;
#self{qw/ table_header body_content /} = #_;
bless \%self, $package;
}

Moose traits for multdimensional data structures

Breaking out handling an internal variable from calls on the variable into calls on the object is easy using the Attribute::Native::Trait handlers. However, how do you deal with multiple data structures? I can't think of any way to handle something like the below without making the stash an arrayref of My::Stash::Attribute objects, which in turn contain an arrayref of My::Stash::Subattribute objects, which contains an arrayref My::Stash::Instance objects. This includes a lot of munging and coercing the data each level down the stack as I sort things out.
Yes, I can store the items as a flat array and then grep it on every read, but in a situation with frequent reads and that most calls are reads, grepping against a large list of array items is a lot of processing every read vs just indexing the items internally in the way needed.
Is there a MooseX extension that can handle this sort of thing via handlers creating methods, instead of just treating the read accessor as the hashref it is and modifying it in place? Or am I just best off forgetting about doing things like this via method call and just doing it as-is?
use strict;
use warnings;
use 5.010;
package My::Stash;
use Moose;
has '_stash' => (is => 'ro', isa => 'HashRef', default => sub { {} });
sub add_item {
my $self = shift;
my ($item) = #_;
push #{$self->_stash->{$item->{property}}{$item->{sub}}}, $item;
}
sub get_items {
my $self = shift;
my ($property, $subproperty) = #_;
return #{$self->_stash->{$property}{$subproperty}};
}
package main;
use Data::Printer;
my $stash = My::Stash->new();
for my $property (qw/foo bar baz/) {
for my $subproperty (qw/fazz fuzz/) {
for my $instance (1 .. 2) {
$stash->add_item({ property => $property, sub => $subproperty, instance => $instance })
}
}
}
p($_) for $stash->get_items(qw/baz fuzz/);
These are very esoteric:
sub add_item {
my $self = shift;
my ($item) = #_;
push #{$self->_stash->{$item->{property}}{$item->{sub}}}, $item;
}
So add_item takes an hashref item, and pushes it onto an array key in stash indexed by it's own keys property, and sub.
sub get_items {
my $self = shift;
my ($property, $subproperty) = #_;
return #{$self->_stash->{$property}{$subproperty}};
}
Conversely, get_item takes two arguments, a $property and a $subproperty and it retrieves the appropriate elements in a Array in a HoH.
So here are the concerns into making it MooseX:
There is no way in a non-Magic hash to insist that only hashes are values -- this would be required for predictable behavior on the trait. As in your example, what would you expect if _stash->{$property} resolved to a scalar.
add_item has it's depth hardcoded to property and sub.
returning arrays is bad, it requires all of the elements to be pushed onto the stack (return refs)
Now firstly, I don't see why a regular Moose Hash trait couldn't accept array refs for both the setter and getter.
->set( [qw/ key1 key2/], 'foo' )
->get( [qw/ key1 key2/] )
This would certainly make your job easier, if your destination wasn't an array:
sub add_item {
my ( $self, $hash ) = #_;
$self->set( [ $hash->{property}, $hash->{subproperty} ], $hash );
}
# get items works as is, just pass in an `ArrayRef`
# ie, `->get([$property, $subproperty])`
When it comes to having the destination be an array than a hash slot, I assume you'd just have to build that into a totally different helper in the trait, push_to_array_or_create([$property, $subproperty], $value). I'd still just retrieve it with the fictional get helper specified above. auto_deref type functionality is a pretty bad idea.
In short ask a core developer on what they would think about extending set and get in this context to accept ArrayRefs as keys and act appropriately. I can't imagine there is a useful default for ArrayRef keys (I don't think regular stringification would be too useful.).