perl: function name clash - perl

Let's say, there are two modules is our framework: Handler.pm and Queries.pm
Queries.pm is optional and is being loaded at fastcgi process startup
BEGIN {
&{"check_module_$_"} () foreach (Queries);
}
sub check_module_queries {
...
require Eludia::Content::Queries;
...
}
every module function is loaded in one common namespace
now, there are two functions with same name (setup_page_content) in Handler.pm and Queries.pm
setup_page_content is being called in Handler.pm
It looks like original author suggested that Queries.pm:setup_page_content will be called, whenever Queries.pm is loaded
Sometimes it doesn't happen: traceback (obtained via caller ()) in these cases indicates, that setup_page_content was called from module Handler.pm
I logged %INC just before call and it contains Queries.pm and it full path in these cases
This behaviour is inconsistent and pops like in 2-3% of attempts on production installation, mostly when I send two parallel identical http requests. Due amount of effort to reproduce, I doesn't determine yet, whether it is installation specific.
How it will be decided which version of function with same name will be called?
Is it well-defined behaviour?
There should be a reason, original author wrote the code this way
UPDATE
perl version is v5.10.1 built for x86_64-linux-gnu-thread-multi
UPDATE 2
code: Handler.pm and Queries.pm
Queries.pm loading occurs in check_module_queries (BEGIN {} of Eludia.pm),
loaded for every request using Loader.pm (via use Loader.pm <params>)

I'd like to call Custom.pm:setup_page_content, whenever Custom.pm is loaded
So you'd like to call Custom::setup_page_content when it exists.
if (exists(&Custom::setup_page_content)) {
Custom::setup_page_content(...);
} else {
Handler::setup_page_content(...);
}
Sometimes it doesn't happen.
The total lack of information about this prevents me from commenting. Code? Anything?
Is there a way to 'unload' a module but leave it in %INC?
Loading a module is simply running it. You can't "unrun" code. At best, you can manually undo some of the effects of running it. Whether that includes changing %INC or not is entirely up to you.

When there is a namespace collision, the most recently defined/loaded function will take precedence.

These are not proper modules. These are simply libraries. If you turn them into proper modules with proper name spacing you won't have this issue. You barely need to do anything more than:
package Queries;
sub new {
my $proto = shift;
my $class = ref($proto) || $proto;
my $self = {};
bless ( $self, $class );
return $self;
}
And you'll be well on your way. Of course your program will need a couple changes to actually access the class functions. You can look at exporting to the functions to possibly save some refactoring time.
I have forked the code at github and initiated a pull request that I hope will address your problem.

The behaviour is perfectly defined: when you call PACKAGENAME::function_name, PACKAGENAME::function_name is called. It cannot be more than one PACKAGENAME::function_name at at time.
It is possible to redefine each function so many times as you wish. When you, say, execute some code like
eval "sub PACKAGENAME::function_name {...}";
the function is [re]defined. When you require some .pm file, it's much like eval the string with its content.
If PACKAGENAME is not defined explicitly, the __PACKAGE__ value is used.
So if you remark that the version of setup_page_content is taken from Handler.pm, it must indicate that Handler.pm's content was executed (by use, require, read/eval etc.) after Queries.pm was loaded (for the last time).
So you have to track all events of Handler.pm loading. The easy way is to add some debug print at the start of Handler.pm.

Related

Cryptic Moo (Perl) Error "Attempt to bless into a reference at..."

Probably a long shot but I'm wondering if anyone has seen an error like this before, as I can not reproduce it outside of a production environment. Essentially the situation is as follows:
I have a module called My::Budget::Module (renamed for simplicity) which is responsible for updating the "budget" for a given object in the application
The My::Budget::Module uses a Moo object that I built called My::Bulk::Update::Module which does the following:
build up an array of database rows that need to be updated
build a MySQL update query string / statement which will update all rows at once
actually update all rows at once
The My::Bulk::Update::Module will then perform the update and mark the rows that have been updated as "stale" so that they will not be cached
The error always seems to occur somewhere after adding a row to be updated but before the code which actually applies the update returns.
If you look at the stack trace that I have included below you can see that the error takes the form
Attempt to bless into a reference at...
and the point at which this occurs is in the constructor of Moo/Object.pm which is Version 2.003002 of Moo from cpan(see here).
Attempt to bless into a reference at /path/to/module/from/cpan/Moo/Object.pm line 25 at /path/to/module/from/cpan/Moo/Object.pm line 25.
Moo::Object::new(My::Bulk::Update::Module=HASH(0xf784b50)) called at (eval 1808) line 28
MongoDB::Collection::new(My::Bulk::Update::Module=HASH(0xf784b50)) called at /path/to/my/bulk/update/module line XXXX
My::Bulk::Update::Module::apply_bulk_update(My::Bulk::Update::Module=HASH(0xf784b50)) called at /path/to/my/budget/module line XXXX
My::Budget::Module::update_budget(My::Budget::Module=HASH(0xf699a38)) called at /path/to/my/budget/module line XXXX
Moving backwards through the stack trace leads to MongoDB::Collection & this is where things start to get very weird.
MongoDB::Collection is also a cpan module but the module which appears at this point varies and I can't see a pattern here except that it is always a Moo object. Moreover, I'm unsure why this module is being instantiated as there is no call to MongoDB::Collection::new at the line mentioned.
In addition, from the stack trace it looks like MongoDB::Collection and Moo::Object are instantiated with the first argument being My::Bulk::Update::Module=HASH(0xf784b50). Given the application logic I do not believe MongoDB::Collection should be instantiated here nor should My::Bulk::Update::Module be passed to MongoDB::Collection at all.
Other than the fact that it is a Moo object, My::Bulk::Update::Module does not extend any other module and is designed to be a stand alone "utility" module. It is only used at one place in the entire application.
Has anyone seen something similar before?
EDIT: Adding some more code - apply_bulk_update doesn't do much at all. There is no call to MongoDB::Collection here and MongoDB::Collection just "happens" to be the moudule included in the stack trace in this particular example. This is not always MongoDB::Collection - I've also seen MongoDB::Timestamp, MongoDB::Cursor, Search::Elasticsearch::Serializer::JSON, Search::Elasticsearch::Logger::LogAny etc etc
sub apply_bulk_update
{
my $self = shift;
my ($db) = #_; # wrapper around DBI module
my $query = $self->_generate_query(); # string UPDATE table SET...
my $params = $self->_params; # arrayref
return undef unless $params && scalar #$params;
$db->do($query, undef, #$params);
}
The code sometimes dies as soon as apply_bulk_update is called, sometimes on the call to _generate_query and sometimes after the query executes on the last line...
Just in case anyone was interested...
After a chunk of further debugging the error was traced to the exact point where My::Bulk::Update::Module::apply_bulk_update or My::Bulk::Update::Module::_generate_query was called but logging code inside these subroutines determined that they were not being executed as expected.
To determine what was going on B::Deparse was used to rebuild the source code for the body of these subroutines (or at least the source code located at the memory address to which these subs were pointing)
After using this library e.g.
B::Deparse->new->coderef2text(\&My::Bulk::Update::_generate_query)
it became obvious that the error occurred when My::Bulk::Update::_generate_query was pointing at a memory location which contained something entirely different (i.e. MongoDB::Collection::new etc).
This issue appears to have been solved upstream by the following commit in the Sub::Defer module (which is a dependency for Moo).
https://github.com/moose/Sub-Quote/commit/4a38f034366e79b76d29fec903d8e8d02ee01896
If you read the summary of the commit you can see the change that was made:
Prevent defer_info and undefer_sub from operating on expired subs. Validate that the arguments to defer_info and undefer_sub refer to
actual live subs. Use the weak refs we are storing to the deferred and
undeferred subs to make sure the original subs are still alive, and we
aren't returning data related to a reused memory address. Also make sure we don't expire data related to unnamed subs. Since the
user can capture the undeferred sub via undefer_sub, we can't track the
expiry without using a fieldhash. For now, avoid introducing that
complexity, since the amount we leak should not be that great.
Upgrading the version of Sub::Defer appears to have solved the issue.

Unexpected behavior in nested recursive function

I have some code that behaves rather strangely.
I am inside a function, and I declare a nested one, which should check if something isn't okay. If it's not then it should sleep for five seconds and call itself again.
sub stop {
sub wait_for_stop {
my $vm_view = shift;
if ( $vm_view->runtime->powerState->val ne "poweredOff" ) {
debug("...");
sleep(5);
wait_for_stop();
}
}
debug("Waiting for the VM to stop");
wait_for_stop( #$vm_views[0] );
}
So, in the call that results in the recursion inside the if condition, if I put the parameter (as the function definition expects it), like this:
wait_for_stop($vm_view);
I get an infinite loop.
If I leave it without a parameter, as in the code example above, it works as expected.
Shouldn't $vm_view in the subsequent calls be empty? Or the last used value ($vm_view->runtime->powerState->val)? Either case should result in unexpected behavior and error.
But it works without any parameter. So why is that? Is there something I've missed from perldoc?
EDIT1: Actually, the value of $vm_views does get changed, so that's not the reason for the infinite loop.
General clarifications
I am using the VMware SDK. The $vm_views object contains the VM details. I am polling one of its methods to detect a change, in this particular case, I need to know when the machine is turned off. So, for lack of a better way, I make a call every 5 seconds until the value is satisfactory.
My purpose is to stop the VM, make modifications that can only be made while it's off, and then launch it.
Actual question
When I don't pass a parameter, the block works as expected – it waits until the value is poweredOff (the VM is off), and continues, which doesn't make much sense, at least to me.
In the case I put $vm_view as parameter, I get an infinite loop (the value will still get changed, since I'm calling a method).
So I am wondering why the function works, when after the first call, $vm_view should be undef, and therefore, be stuck in an infinite loop? [undef ne "poweredOff" -> sleep -> recursion till death]
And why, when I pass the expected value, it gets stuck?
PS: To all those saying my recursion is weird and useless in this scenario – due to a myriad of reasons, I need to use such a format (it's better suited for my needs, since, after I get this bit working, I'll modify it to add various stuff and reuse it, and, for what I have in mind, a function seems to be the best option).
You should always look at your standard tools before going for something a little more exotic like recursion. All you need here is a while loop
It's also worth noting that #$vm_views[0] should be $$vm_views[0]) or, better, $vm_views->[0]. And you don't gain anything by defining a subroutine inside another one -- the effect is the same as if it was declared separately afterwards
An infinite loop is what I would expect if $vm_view->runtime->powerState->val never returns poweredOff, and the code below won't fix that. I don't see any code that tells the VM to stop before you wait for the status change. Is that correct?
I don't understand why you say that your code works correctly when you call wait_for_stop without any parameters. You will get the fatal error
Can't call method "runtime" on an undefined value
and your program will stop. Is what you have posted the real code?
This will do what you intended. I also think it's easier to read
use strict;
use warnings;
my $vm_views;
sub stop {
debug ("Waiting for the VM to stop");
my $vm_view = $vm_views->[0];
while ( $vm_view->runtime->powerState->val ne 'poweredOff' ) {
debug('...');
sleep 5;
}
}
I think you would have a better time not calling wait_for_stop() recursively. This way might serve you better:
sub stop
{
sub wait_for_stop
{
my $vm_view = shift;
if ($vm_view->runtime->powerState->val ne "poweredOff")
{
debug("...");
#sleep(5);
#wait_for_stop();
return 0;
}
return 1;
}
debug ("Waiting for the VM to stop");
until(wait_for_stop(#$vm_views[0]))
{
sleep(5);
}
}
Your old way was rather confusing and I don't think you were passing the $vm_view variable through to the recursive subroutine call.
Update:
I tried reading it here:
https://www.vmware.com/support/developer/viperltoolkit/doc/perl_toolkit_guide.html#step3
It says:
When the script is run, the VI Perl Toolkit runtime checks the
environment variables, configuration file contents, and command-line
entries (in this order) for the necessary connection-setup details. If
none of these overrides are available, the runtime uses the defaults
(see Table 1 ) to setup the connection.
So, the "runtime" is using the default connection details even when a vm object is not defined? May be?
That still doesn't answer why it doesn't work when the parameter is passed.
You need to understand the VM SDK better. You logic for recursion and usage of function parameters are fine.
Also, the page: https://www.vmware.com/support/developer/viperltoolkit/doc/perl_toolkit_guide.html
says -
VI Perl Toolkit views have several characteristics that you should
keep in mind as you write your own scripts. Specifically, a view:
Is a Perl object
Includes properties and methods that correlate to the properties and
operations of the server-side managed object
Is a static copy of one or more server-side managed objects, and as
such (static), is not updated automatically as the state of objects
on the server changes.
So what the "vm" function returns is a static copy, which can be updated from the script. May be it is getting updated when you make a call while passing the $vm_view?
Old Answer:
Problem is not what you missed from Perl docs. The problem is with your understanding of recursion.
The purpose of recursion is to keep running until $vm_view->runtime->powerState->val becomes "PoweredOff" and then cascade back. If you don't update the value, it keeps running forever.
When you say:
I get an infinite loop.
Are you updating the $vm_view within the if condition?
Otherwise, the variable is same every time you call the function and hence you can end up in infinite loop.
If I leave it without a parameter, as in the code example above, it
works as expected.
How can it work as expected? What is expected? There is no way the function can know what value your $vm_view is being updated with.
I have simplified the code, added updating a simple variable (similar to your $vm_view) and tested. Use this to understand what is happening:
sub wait_for_stop
{
my $vm_view = shift;
if ($vm_view < 10){
print "debug...\n";
sleep(5);
$vm_view++; // update $vm_view->runtime->powerState->val here
// so that it becomes "poweredOff" at some point of time
// and breaks the recursion
wait_for_stop($vm_view);
}
}
wait_for_stop(1);
Let me know in comments how the variable is being updated and I will help resolve.

Looking for a clearer Carp::croak()

I recently noticed that I'm prepending __PACKAGE__ and sub/method name to most croak()'s messages because it makes tracking down errors easier. So I started writing a _croak() wrapper that adds this by default (using caller(1)).
E.g.
sub _croak {
my ($msg) = shift // '';
$msg = (caller 1)[3].': '.$msg
unless ref $msg;
Carp::croak($msg);
};
Now every (textual) exception is attributed both to the point where my module was misused (e.g. bad parameter passed), and to the module itself.
And the question: is there a standard module/technique for this? (Full stack traces aka confess() are overkill most of the time).
Instead of getting croak to add caller information, I let my logger do it. Log4perl let's me set the format for the messages I care to log. croak does its job, and the logger lets me peek at what's going on.
A standard technique is to keep generating simple exceptions, and turn them into stacktraces only when needed by loading Carp::Always::Color from the command-line.

Is my Rose::DB::Object compile-time too slow?

I'm planning to move from Class::DBI to Rose::DB::Object due to its nice structure and the jargon that RDBO is faster compares to CDBI and DBIC.
However on my machine (linux 2.6.9-89, perl 5.8.9) RDBO compiled time is much slower than CDBI:
$ time perl -MClass::DBI -e0
real 0m0.233s
user 0m0.208s
sys 0m0.024s
$ time perl -MRose::DB::Object -e0
real 0m1.178s
user 0m1.097s
sys 0m0.078s
That's a lot different...
Anyone experiences similar behaviour here?
Cheers.
#manni and #john: thanks for the explanation about the modules referenced by RDBO, it surely answers why the compile-time is slower than CDBI.
The application is not running on a persistent environment. In fact it's invoked by several simultaneous cron jobs that run at 2 mins, 5 mins, and x mins interval - so yes, compile-time is crucial here...
Jonathan Rockway's App::Persistent seems interesting, however its (current) limitation to allow only one application running at a time is not suitable for my purpose. Also it has issue when we kill the client, the server process is still running...
Rose::DB::Object simply contains (or references from other modules) much more code than Class::DBI. On the bright side, it also has many more features and is much faster at runtime than Class::DBI. If compile time is concern for you, then your best bet is to load as little code as possible (or get faster disks).
Another option is to set auto_load_related_classes to false in your Metadata objects. To do this early enough and globally will probably require you to make a Metadata subclass and then set that as the meta_class in your common Rose::DB::Object base class.
Turning auto_load_related_classes off means that you'd have to manually load related classes that you actually want to use in your script. That's a bit of a pain, but it lets you control how many classes get loaded. (If you have heavily interrelated classes, loading a single one can end up pulling all the other ones in.)
You could, perhaps, have an environment variable to control the behavior. Example metadata class:
package My::DB::Object::Metadata;
use base 'Rose::DB::Object::Metadata';
# New class method to handle default
sub default_auto_load_related_classes
{
return $ENV{'RDBO_AUTO_LOAD_RELATED_CLASSES'} ? 1 : 0
}
# Override existing object method, honoring new class-defined default
sub auto_load_related_classes
{
my($self) = shift;
return $self->SUPER::auto_load_related_classes(#_) if(#_);
if(defined(my $value = $self->SUPER::auto_load_related_classes))
{
return $value;
}
# Initialize to default
return $self->SUPER::auto_load_related_classes(ref($self)->default_auto_load_related_classes);
}
And here's how it's tied to your common object base class:
package My::DB::Object;
use base 'Rose::DB::Object';
use My::DB::Object::Metadata;
sub meta_class { 'My::DB::Object::Metadata' }
Then set RDBO_AUTO_LOAD_RELATED_CLASSES to true when you're running in a persistent environment, and leave it false (and don't forget to explicitly load related classes) for command-line scripts.
Again, this will only help if you're currently loading more classes than you strictly need in a particular script due to the default true value of the auto_load_related_classes Metadata attribute.
If compile time is an issue, there are methods to lessen the impact. One is PPerl which makes a normal Perl script into a daemon that is compiled once. The only change you need to make (after installing it, of course) is to the shebang line:
#!/usr/bin/pperl
Another option is to code write a client/server model program where the bulk of the work is done by a server that loads the expensive modules and a thin script that just interacts with the server over sockets or pipes.
You should also look at App::Persistent and this article, both of which were written by Jonathan Rockway (aka jrockway).
This looks almost as dramatic over here:
time perl -MClass::DBI -e0
real 0m0.084s
user 0m0.080s
sys 0m0.004s
time perl -MRose::DB::Object -e0
real 0m0.391s
user 0m0.356s
sys 0m0.036s
I'm afraid part of the difference can simply be explained by the number of dependencies in each module:
perl -MClass::DBI -le 'print scalar keys %INC'
46
perl -MRose::DB::Object -le 'print scalar keys %INC'
95
Of course, you should ask yourself how much compilation time really matters for your particular problem. And what source code would be easier to maintain for you.

How can I replace all 'die's with 'confess' in a Perl application?

I'm working in a large Perl application and would like to get stack traces every time 'die' is called. I'm aware of the Carp module, but I would prefer not to search/replace every instance of 'die' with 'confess'. In addition, I would like full stack traces for errors in Perl modules or the Perl interpreter itself, and obviously I can't change those to use Carp.
So, is there a way for me to modify the 'die' function at runtime so that it behaves like 'confess'? Or, is there a Perl interpreter setting that will throw full stack traces from 'die'?
Use Devel::SimpleTrace or Carp::Always and they'll do what you're asking for without any hard work on your part. They have global effect, which means they can easily be added for just one run on the commandline using e.g. -MDevel::SimpleTrace.
What about setting a __DIE__ signal handler? Something like
$SIG{__DIE__} = sub { Carp::confess #_ };
at the top of your script? See perlvar %SIG for more information.
I usually only want to replace the dies in a bit of code, so I localize the __DIE__ handler:
{
use Carp;
local $SIG{__DIE__} = \&Carp::confess;
....
}
As a development tool this can work, but some modules play tricks with this to get their features to work. Those features may break in odd ways when you override the handler they were expecting. It's not a good practice, but it happens sometimes.
The Error module will convert all dies to Error::Simple objects, which contain a full stacktrace (the constructor parses the "at file... line.." text and creates a stack trace). You can use an arbitrary object (generally subclassed from Error::Simple) to handle errors with the $Error::ObjectifyCallback preference.
This is especially handy if you commonly throw around exceptions of other types to signal other events, as then you just add a handler for Error::Simple (or whatever other class you are using for errors) and have it dump its stacktrace or perform specialized logging depending on the type of error.