I am writing a Perl script that gets data from a changing outside source. As data is added or removed from this outside source I would like the data structures in my Perl script to mirror the changes. I was able to do this by creating objects that store the data in a similar fashion and synchronizing the data in those objects whenever I try to access the data in my Perl script.
This works great, and gets any new data that is added to this external source; however a problem arises when any data is removed. If any data is removed I want all existing references to it in my Perl script to be destroyed as well so the user can no longer attempt to access the data without raising an error.
The only way I could think of was to undefine the internal reference to the data as soon as it is determined that the data no longer exists. It appears, however; that when a reference is undefined it doesn't remove the data stored in the location of the reference, it only deletes the data from the variable that holds the reference.
Here is a test script demonstrating my problem:
#!/usr/bin/perl
# Always use these
use strict;
use warnings;
####################################################################################################
# Package to create an object
package Object;
use Moose;
# Define attributes
has 'name' => (is => 'ro', isa => 'Str', required => 1);
####################################################################################################
# Package to test with
package test;
# Create an object
my $object1 = Object->new('name' => 'Test Object');
print 'OBJECT1 NAME: '.$object1->name()."\n";
# Create another reference to the object
my $object2 = $object1;
# Print the name
print 'OBJECT2 NAME: '.$object2->name()."\n";
# Print both references
print "\n";
print "OBJ : $object1\n";
print "OBJ2 : $object2\n";
print "\n";
# Undefine the reference to object2
undef $object2;
# Try to print both names
print 'OBJECT1 NAME: '.$object1->name()."\n";
print 'OBJECT2 NAME: '.$object2->name()."\n";
Output:
How can I completely destroy all traces of an object so that any attempt to access it's data will result in an error?
EDIT:
Here is a different example that may explain better what I am trying to achieve.
Say I have a file object:
my $file = File->new();
Now I want to get the text in that file
my $text = $file->text();
Now I get it again (for some unknown reason)
my $text2 = $file->text();
I want to be able to modify $text and have it directly effect the contents of $text2 as well as change the actual text attribute of the file.
I'm basically trying to tie the variables together so if one changes they all change. Also if one is deleted they would all be deleted.
This would also mean if the text attribute is changed, $text1 and $text2 would also change with it to reflect the new value.
Could this be done using an alias of some sort?
Perl uses reference counting to "retire" data.
What your program is doing is as follows:
Create an object and assign a reference to $object1
Copy that reference to $object2 and increment the object's reference count
Change the value of $object2 and decrement the object's reference count.
In the end you still have a hold of the object via $object1 and it's reference count will not drop to below 1 while you keep a hold of it. Technically as your program shuts down the $object1 variable will be destroyed as it goes out of scope, at which the object's reference count will drop to 0 and perl will look for and try to call it's DESTROY method.
If you truly want to "see" an item destroyed you may want to look into defining a DESTROY method that prints out a message upon object destruction. This way you can "let go" of your last reference and still see it's destruction.
You can't free an object that's still being used. Have the object itself keep track of whether it's still valid or not.
sub new {
my ($class, $data) = #_;
my $self = bless({}, $class);
$self->{valid} = 1;
$self->{data} = $data;
return $self;
}
sub delete {
my $self = shift;
undef(%$self);
}
sub data {
my $self = shift;
croak("Invalid object") if !$self->{valid};
$self->{data} = $_[0] if #_;
return $self->{data};
}
Example:
my $o1 = Class->new('big_complex_data');
my $o2 = $o1;
say $o1->data();
say $o2->data();
$o1->delete();
say $o2->name();
Output:
big_complex_data
big_complex_data
Invalid object at a.pl line 39.
Related
I am trying to make a tinier, more optimized CGI module for Perl.
When working on the init subroutine to get, or set default file data, I run into an issue primarily caused with the MIME::Types module.
In reality, my init subroutine is supposed to get the blessed variables from the new constructor, initialize MIME::Types, use MIME::Types to return the MIME::Type of $type via mimeTypeOf(), determine if the MIME::Type is binary, and print out the Content-Type and Content-Disposition.
However, when trying to do something of that nature, I get the title as the error.
This is my current code, which in theory should work.
sub new {
my ($class, %args) = #_;
my $self = {};
my $type = $args{type} // 'text/html'; # If type isn't defined, default to HTML content.
my $attachment = ($args{attachment} // 'false') eq 'true' ? 'attachment' : 'inline'; # Same, but with disposition instead.
$self->{type} = $type;
$self->{attachment} = $attachment;
bless $self, $class;
return $self;
}
sub init {
my $self = shift;
CORE::state $type = shift // $self->{type}; # If there are no overrides, just use $self->type;
# print $self->type; prints text/html when not specified.
CORE::state $attachment = shift // $self->{attachment}; # Same as above, but with $self->attachment;
# print $self->attachment; prints inline when not specified.
my $types = MIME::Types->new;
my $mime = $types->mimeTypeOf($type);
if ($mime->isBinary) {
$self->{attachment} = 'attachment';
} else {
$self->{attachment} = ('inline' or $attachment);
}
die "Warning: Binary content types should not be sent inline!\r\n" if ($mime->isBinary && $attachment eq 'inline');
print "Content-Type: $mime\r\nContent-Disposition: $self->{attachment}\r\n\r\n";
return;
}
Even printing the types return a non undefined value, in fact, it prints out what it's supposed to when there's nothing inside of the new method. This is the code which is in my main CGI file.
#!C:\Strawberry\perl\bin\perl.exe
use strict;
use warnings;
use TinyCGI::Core;
# Create a new TinyCGI::Core object
my $cgi = TinyCGI::Core->new();
# Initialize the TinyCGI::Core object
$cgi->init();
print 'Hello, World!';
This worked perfectly fine without MIME::Types, and even works just fine with stuff actually defined within the object.
I've also tried not using CORE::state.
You are using MIME::Types incorrectly. It looks like you want to take the value from an HTTP header (or maybe multi-part body header) and figure out what it is. You are using mimeTypeOf, which expects a file extension or filename, when you should be using type, which expects a type string.
Either way, there's a chance that you could get back an undef value because MIME::Types might not be able to map the string to a type. You should guard against that:
my $type = $mime->type( $self->{type} );
if( defined $type ) { ... }
Unrelated to that, you are using CORE::state for some reason. That makes it look like you are defining some method or subroutine named state and expecting it to conflict with the Perl built-in.
But I don't think you want state here. You have an instance method init that is for some reason persisting a value based on an earlier created, unrelated instance. I think you want my, since you are merely giving a local name to something that's already inside the invoking instance. You don't want to persist that to the next instance that calls this.
My web application uses a Log module to record various events. The log object is initialized by being passed a CGI::Session object containing various information about the user. This information gets copied to the log object's data fields. Owing to the volume of activity on the site and the fact that a single visit to the site can result in multiple loggable events, the log module currently caches all events in memory, then actually writes them to the log file in the DESTROY function. However, this results in the session parameters being frozen at the time the log object is initialized, which occurs at the beginning of the request.
Recently, some new parameters were required to be logged that a) would be stored in the session object, and b) needed to be logged as their final, instead of initial value (and would potentially change during execution). My initial idea was to instead store a reference to the session object in the log object, but as the DESTROY function typically is called in global destruction, I have no guarantee that the session object will still be defined when the log is destroyed. Is there a way to guarantee that the CGI::Session object won't be destroyed before my log, hopefully without having to add an explicit destruct to each page of the app?
#old
package Log;
sub new
{
my $class = shift;
my $session = shift; #CGI::Session
my $self = {session => {customer_id => $session->param('customer_id')}, events => []};
return bless $self, $class;
}
sub log_event
{
my $self = shift;
my $event = shift;
push #{$self->{'events'}}, {event_type => $event->{'type'}, timestamp => $event->{'timestamp'}};
}
sub DESTROY
{
my $self = shift;
if (scalar #{$self->{'events'}})
{
open LOG, "/tmp/log";
print LOG, Dumper({session => $self->{'session'}, events => $self->{'events'}});
close LOG;
}
}
#new
package Log;
sub new
{
my $class = shift;
my $session = shift;#CGI::Session
my $self = {session => $session, events => []};
return bless $self, $class;
}
sub log_event
{
my $self = shift;
my $event = shift;
push #{$self->{'events'}}, {event_type => $event->{'type'}, timestamp => $event->{'timestamp'}};
}
sub DESTROY
{
my $self = shift;
if (scalar #{$self->{'events'}})
{
open LOG, "/tmp/log";
print LOG, Dumper({session => {customer_id => $self->{'session'}->param('customer_id')}}, events => $self->{'events'}});
close LOG;
}
}
Perl uses reference-counting to govern object destruction[1]. This means that under normal circumstances, if object A references object B, object A will be destroyed before object B.
This fails if the object survives until global destruction. There are two circumstances where this happens:
Reference cycle. If two objects directly or indirectly reference each other, the order of destruction of the objects involved in the cycles is unpredictable.
Global variable. The order in which objects referenced by package variables (and therefore objects directly or indirectly referenced by them) is unpredictable (though Perl does try to do the right thing).
So if the log holds a reference to a session object (as it appears you are doing), the log will be destroyed first (within the limits I mentioned above).
If the objects are in package variables instead of lexical (my) variables, and if you don't want to change that, you could use the following in the main program:
use Sub::ScopeFinalizer qw( scope_finalizer );
# Here or wherever.
our $log;
our $session;
{
# The lexicals within these curlies will get destroyed before
# global destruction. This will lead to the code ref provided to
# scope_finalizer getting called before global destruction.
my $guard = scope_finalizer {
# The order doesn't matter because the log object
# holds a reference to the session object.
$log = undef;
$session = undef;
};
# ... Main program here ...
}
As long as the log holds a reference to the session, this will ensure that the objects are destroyed in the correct order. Even if the program dies or exits.
You can tell if objects of a class are surviving until global destruction by adding the following code to the program and examining the order of the output:
DESTROY { warn($_[0]->id . " destroyed.\n"); } # In the class
END { warn("Global destruction."); }
In this post, I'm using the word "object" loosely. I don't just means instances of classes, but also scalars, arrays, hashes subs and other values and variables.
Similarly, referencing refers not just to a Perl reference, but other forms of referencing. For example, arrays and hashes reference their elements, and subs reference variables they capture, etc.
Perl currently supports Copy on Write (CoW) for scalar variables however it doesn't appear to have anything for hashrefs and arrayrefs.
Perl does, however, have subroutines to modify variable internals like weaken so I'm guessing that there might exist a solution.
I have a situation where I have a large structure I'm returning from a package which keeps an internal state of this large structure. I want to ensure that if either the returned references or the internal reference (which are both currently the same reference) is modified that I end up with a Copy-on-write situation where the data the references are pointing to is copied, modified and the reference used to modified the data is updated to point to the new data.
package SomePackage;
use Moose;
has some_large_internal_variable_ref => (
'is' => 'rw',
'isa' => 'HashRef',
);
sub some_operation {
my ($self) = #_;
$self->some_large_internal_variable_ref({
# create some large result that is different every time
});
}
sub get_result {
my ($self) = #_;
return $self->some_large_internal_variable_ref;
}
1;
use strict;
use warnings;
use SomePackage;
use Test::More;
# Situtation 1 where the internally stored reference is modified
# This will pass!
my $package = SomePackage->new();
$package->some_operation();
my $result1 = $package->get_result();
$package->some_operation();
my $result2 = $package->get_result();
isnt($result1, $result2, "These two references should no longer be the same");
# Situtation 2 where the externally stored references is modified
# This will fail
$package = SomePackage->new();
$package->some_operation();
$result1 = $package->get_result();
$result1->{foo} = "bar";
$result2 = $package->get_result();
isnt($result1, $result2, "These two references should no longer be the same");
done_testing;
I'm trying to avoid a situation where I have to clone the values on the get_result return as this would result in a situation where memory usage is doubled.
I'm hoping there is some form of weaken I can call on the variable to indicate that, should a modification be made to behave with Copy on Write behaviour.
When you create a Row object in DBIx::Class you can pass related objects as values, e.g.
my $author = $authors_rs->find(1);
my $book = $books_rs->create({ author => $author, title => 'title' });
However, if you later use the author accessor, the object is retrieved again from the database. Is it possible to create an object so that the associated object can be accessed without the additional query?
How about just copying what you want from $author into some plain old Perl variables?
If you want to copy a whole structure, the clone module may help (I don't have experience with this module; I just found it online).
I'm not sure that I'm understanding you correctly, but if I am, perhaps you want to look into the prefetch functionality to automatically be ready for calling those other related row objects.
For example, in Galileo, when listing all pages (articles) I use this mechanism to get the related author objects for each page object (see here).
Ok so if the point is to store one object in another, you might want to inject some extra data into the object.
UNTESTED:
## some initial checks (run these only once)
# check that method name is available
die "Cannot use method 'extra_data'" if $book->can('extra_data');
# check that the reftype is a hash
require Scalar::Util;
die "Incorrect underlying type" unless Scalar::Util::reftype($book) eq 'HASH';
# check that the key is available
die "Key unavailable" if exists $book->{'my_extra_data'};
{
no strict 'refs';
# create a simple accessor for a hash stored in an object
*{ ref($book) . '::extra_data' } = sub {
my $self = shift;
#return all extra data if called without args
return $self->{my_extra_data} unless #_;
my $key = shift;
if (#_) {
$self->{my_extra_data}{$key} = shift;
}
return $self->{my_extra_data}{$key};
};
}
$book->extra_data( author => $author );
#then later
my $stored_author = $book->extra_data('author');
I hope this is something straightforward that I'm doing wrong. I saw something online about "variable suicide" that looked good, but it was for an older version and I'm on 5.10.1.
Anyway - a variable that I declared - $RootDirectory - just suddenly loses its value, and I can't figure out why.
Here's a script to reproduce the problem. When I run through the script in debug mode (perl -d) I can get it to print out the $RootDirectory at line 21 and 26. But it's gone by line 30.
use strict;
my $RootDirectory;
my #RootDirectories;
#RootDirectories = (
'c:\\P4\\EDW\\PRODEDW\\EDWDM\\main\\db\\'
,'c:\\P4\\EDW\\PRODEDW\\EDWADS\\main\\db\\'
,'c:\\P4\\EDW\\PRODEDW\\FJE\\main\\db\\'
);
foreach $RootDirectory (#RootDirectories) {
# $RootDirectory = 'c:\\P4\\EDW\\PRODEDW\\EDWDM\\main\\db\\';
# print ' In foreach ' . $RootDirectory. "\n";
RunSchema ();
}
exit(0);
sub RunSchema() {
# print ' In RunSchema ' . $RootDirectory. "\n";
CreateTables ();
}
sub CreateTables() {
# print ' In CreateTables ' . $RootDirectory. "\n";
SQLExecFolder ('tbl');
}
sub SQLExecFolder() {
print ' In SQLExecFolder ' . $RootDirectory. "\n"; # Variable $RootDirectory value is gone by now
}
EDIT
Thanks for all the comments! I think for now I'll use the "our" keyword which appears to work well - thanks Nathan. Also thanks toolic about the Use Warnings - I think I'm sold on that one!
The thing that continues to confuse me is why, when I did debug mode (perl -d), and stepped through the code, doing "p $RootDirectory" I got the expected output at lines 21 and 26, but not line 30. How is the situation different at line 30?
Also, I appreciate the comments about best practice being to pass $RootDirectory as a function parameter. I wanted to avoid that because I have so many functions following that - i.e. RunSchema calls CreateTables which calls SQLExecFolder. All of them would have to have the same parameter passed. Does it still make sense in this case, or are there any better ways to structure this?
What Nathan said is correct. That aside, why don't you pass in the value? It's better practice anyway:
foreach $RootDirectory (#RootDirectories) {
# $RootDirectory = 'c:\\P4\\EDW\\PRODEDW\\EDWDM\\main\\db\\';
# print ' In foreach ' . $RootDirectory. "\n";
RunSchema ($RootDirectory);
}
sub SQLExecFolder {
my $RootDirectory = shift;
print ' In SQLExecFolder ' . $RootDirectory. "\n";
}
You're declaring $RootDirectory as the loop variable in a foreach loop. As far as I understand, that means that its value is localized to the loop, and its value is restored to its previous value at the end of the loop.
In your case the variable was never assigned, so at the end of the loop it returns to its previous value of undef.
Edit: Actually, the problem is that $RootDirectory is declared with my, so it is undefined in other scopes. In the functions RunSchema, CreateTables and SQLExecFolder the variable is undefined, regardless of the localization of the foreach.
If you want the variable to be declared for strictness, but want it to be global, declare $RootDirectory with our:
our $RootDirectory;
Edit: That being said, it's not always a good idea to use a global variable. You're better off passing the variable as a parameter to the functions as others have suggested.
Others have answered your question correctly. I just want to emphasize that you should add use warnings; to your code. It would have given a clue to your problem, and it would alert you to another potential hazard.
foreach variable is special - it's local to the loop.
If the variable is preceded with the
keyword my, then it is lexically
scoped, and is therefore visible only
within the loop. Otherwise, the
variable is implicitly local to the
loop and regains its former value upon
exiting the loop. If the variable was
previously declared with my, it uses
that variable instead of the global
one, but it's still localized to the
loop. This implicit localisation
occurs only in a foreach loop.
Please take a look here
The iterator variable in foreach loop is always localized to the loop. See the foreach section in perlsyn. You can pass it to a subroutine as a parameter.
RE: When to use a global variable?
Global variables are risky because they can be changed at any time by any part of the code that accesses them. In addition, it is difficult to track when and where a change occurs, which makes it harder to track down unintentional consequences from modification. In short, each global variable increases coupling between the subroutines that use it.
When does it make sense to use a global? When the benefits outweigh the risks.
If you have many different values needed by most or all of your subroutines, it seems like a good time to use global variables. You can simplify every subroutine invocation, and make the code clearer, right?
WRONG. In this case the right approach is to aggregate all those distinct variables in one container data structure. So instead of foo( $frob, $grizzle, $cheese, $omg, $wtf ); you have foo( $state, $frob ); Where $state = { grizzle => $grizzle, cheese => $cheese, omg => $omg, wtf => $wtf };.
So now we have one variable to pass around. All those sub calls are much simpler. Yet, even so, this is onerous and you still want to clean up the extra argument from each routine.
At this point you have several options:
Make $state global and just access it directly.
Make $state into a configuration object and use methods to control access to attributes.
Make the whole module into a class, and store all the state information in an object.
Option 1 is acceptable for small scripts with few routines. The risk of hard to debug errors is small.
Option 2 makes sense when there is no obvious relationship between the different routines in the module. Using a global state object helps because it decreases coupling between code that accesses it. It is also easier to add logging to track changes to the global data.
Option 3 works well if you have a group of closely related functions that operate on the same data.
Your sample code seems like a good candidate for option 3. I created a class called MySchema and all the methods that operate on a specific directory are now methods. The invoking object carries the data it needs with it.
Now we have nice, clean code and no globals.
use strict;
use warnings;
my #directories = (
'c:\\P4\\EDW\\PRODEDW\\EDWDM\\main\\db\\',
'c:\\P4\\EDW\\PRODEDW\\EDWADS\\main\\db\\',
'c:\\P4\\EDW\\PRODEDW\\FJE\\main\\db\\',
);
for my $schema ( make_schemata(#directories) ) {
$schema->run;
}
sub make_schemata {
my #schemata = map { MySchema->new( directory => $_ } #_;
return #schemata;
}
BEGIN {
package MySchema;
use Moose;
has 'directory' => (
is => 'ro',
isa => 'Str',
required => 1,
);
sub run {
my $self = shift;
$self->create_tables;
}
sub create_tables {
my $self = shift;
$self->sql_exec_folder('tbl');
}
sub sql_exec_folder {
my $self = shift;
my $dir = $self->directory;
print "In SQLExecFolder $dir\n";
}
1;
}
As a bonus, the code in the BEGIN block can be removed and placed in a separate file for reuse by another script. All it needs to be a full-fledged module is its own file named MySchema.pm.
Not a bad effort. Here are a couple of small improvements, and one "fix" which is to pass the variable to the subroutines, as a function parameter because the $RootDirectory variable is scoped (i.e. restricted) to within the foreach loop. In general it is also considered good practice in order to make explicit what variables are being passed and/or accessed by various subroutines.
use strict;
use warnings;
sub RunSchema() {
my $root_dir = shift;
CreateTables($root_dir);
}
sub CreateTables() {
my $root_dir = shift;
SQLExecFolder('tbl', $root_dir);
}
sub SQLExecFolder() {
my ($name, $root_dir) = #_;
}
######################################################
my #RootDirectories = qw(
c:\\P4\\EDW\\PRODEDW\\EDWDM\\main\\db\\
c:\\P4\\EDW\\PRODEDW\\EDWADS\\main\\db\\
c:\\P4\\EDW\\PRODEDW\\FJE\\main\\db\\
);
foreach my $RootDirectory (#RootDirectories) {
# print ' In foreach ' . $RootDirectory. "\n";
RunSchema($RootDirectory);
}
exit(0);