Does plperlu reload Perl modules if they change? [duplicate] - perl

This question already has an answer here:
Does PostgreSQL keep its pl* interpreters loaded persistently?
(1 answer)
Closed 8 years ago.
If I wrote something in plperlu, and it used a Perl module (e.g. MyModule::Foo), when would that module be reloaded? Does it keep track of them like mod_perl's Apache2::Reload, so that a touch will cause a reinterpretation?

After some testing based on what Leon commented, apparently MyModule::Foo stays in memory from the first time it gets used successfully, but only within the current process (i.e., database handle.)
If there were errors in either compiling it (it would complain when you defined a function that use'd it) or using it (when you select from your function, for example), it'll reload it. However, I can't see a way to force it to reload within a process once it successfully runs, even by calling a different sub in the module that does error out.
Also, if you're accessing PostgreSQL via Apache::DBI, this means your cached handles won't pick up module changes unless you disconnect all the cached handles.
So I guess there's no way to force a check within a process, a la Apache2::Reload...

Related

Cryptic Moo (Perl) Error "Attempt to bless into a reference at..."

Probably a long shot but I'm wondering if anyone has seen an error like this before, as I can not reproduce it outside of a production environment. Essentially the situation is as follows:
I have a module called My::Budget::Module (renamed for simplicity) which is responsible for updating the "budget" for a given object in the application
The My::Budget::Module uses a Moo object that I built called My::Bulk::Update::Module which does the following:
build up an array of database rows that need to be updated
build a MySQL update query string / statement which will update all rows at once
actually update all rows at once
The My::Bulk::Update::Module will then perform the update and mark the rows that have been updated as "stale" so that they will not be cached
The error always seems to occur somewhere after adding a row to be updated but before the code which actually applies the update returns.
If you look at the stack trace that I have included below you can see that the error takes the form
Attempt to bless into a reference at...
and the point at which this occurs is in the constructor of Moo/Object.pm which is Version 2.003002 of Moo from cpan(see here).
Attempt to bless into a reference at /path/to/module/from/cpan/Moo/Object.pm line 25 at /path/to/module/from/cpan/Moo/Object.pm line 25.
Moo::Object::new(My::Bulk::Update::Module=HASH(0xf784b50)) called at (eval 1808) line 28
MongoDB::Collection::new(My::Bulk::Update::Module=HASH(0xf784b50)) called at /path/to/my/bulk/update/module line XXXX
My::Bulk::Update::Module::apply_bulk_update(My::Bulk::Update::Module=HASH(0xf784b50)) called at /path/to/my/budget/module line XXXX
My::Budget::Module::update_budget(My::Budget::Module=HASH(0xf699a38)) called at /path/to/my/budget/module line XXXX
Moving backwards through the stack trace leads to MongoDB::Collection & this is where things start to get very weird.
MongoDB::Collection is also a cpan module but the module which appears at this point varies and I can't see a pattern here except that it is always a Moo object. Moreover, I'm unsure why this module is being instantiated as there is no call to MongoDB::Collection::new at the line mentioned.
In addition, from the stack trace it looks like MongoDB::Collection and Moo::Object are instantiated with the first argument being My::Bulk::Update::Module=HASH(0xf784b50). Given the application logic I do not believe MongoDB::Collection should be instantiated here nor should My::Bulk::Update::Module be passed to MongoDB::Collection at all.
Other than the fact that it is a Moo object, My::Bulk::Update::Module does not extend any other module and is designed to be a stand alone "utility" module. It is only used at one place in the entire application.
Has anyone seen something similar before?
EDIT: Adding some more code - apply_bulk_update doesn't do much at all. There is no call to MongoDB::Collection here and MongoDB::Collection just "happens" to be the moudule included in the stack trace in this particular example. This is not always MongoDB::Collection - I've also seen MongoDB::Timestamp, MongoDB::Cursor, Search::Elasticsearch::Serializer::JSON, Search::Elasticsearch::Logger::LogAny etc etc
sub apply_bulk_update
{
my $self = shift;
my ($db) = #_; # wrapper around DBI module
my $query = $self->_generate_query(); # string UPDATE table SET...
my $params = $self->_params; # arrayref
return undef unless $params && scalar #$params;
$db->do($query, undef, #$params);
}
The code sometimes dies as soon as apply_bulk_update is called, sometimes on the call to _generate_query and sometimes after the query executes on the last line...
Just in case anyone was interested...
After a chunk of further debugging the error was traced to the exact point where My::Bulk::Update::Module::apply_bulk_update or My::Bulk::Update::Module::_generate_query was called but logging code inside these subroutines determined that they were not being executed as expected.
To determine what was going on B::Deparse was used to rebuild the source code for the body of these subroutines (or at least the source code located at the memory address to which these subs were pointing)
After using this library e.g.
B::Deparse->new->coderef2text(\&My::Bulk::Update::_generate_query)
it became obvious that the error occurred when My::Bulk::Update::_generate_query was pointing at a memory location which contained something entirely different (i.e. MongoDB::Collection::new etc).
This issue appears to have been solved upstream by the following commit in the Sub::Defer module (which is a dependency for Moo).
https://github.com/moose/Sub-Quote/commit/4a38f034366e79b76d29fec903d8e8d02ee01896
If you read the summary of the commit you can see the change that was made:
Prevent defer_info and undefer_sub from operating on expired subs. Validate that the arguments to defer_info and undefer_sub refer to
actual live subs. Use the weak refs we are storing to the deferred and
undeferred subs to make sure the original subs are still alive, and we
aren't returning data related to a reused memory address. Also make sure we don't expire data related to unnamed subs. Since the
user can capture the undeferred sub via undefer_sub, we can't track the
expiry without using a fieldhash. For now, avoid introducing that
complexity, since the amount we leak should not be that great.
Upgrading the version of Sub::Defer appears to have solved the issue.

How to preserve data between executions of program

I am running a perl script on a HP-UX box. The script will execute every 15 minutes and will need to compare it's results with the results of the last time it executed.
I will need to store two variables (IsOccuring and ErrorCount) between the executions. What is the best way to do this?
Edit clarification:
It only compares the most recent execution to the current execution.
It doesn't matter if the value is lost between reboots.
And touching the filesystem is pretty much off limits.
If you can't touch the file system, try using a shared memory segment. There are helper modules for that like IPC::ShareLite, or you can use the shmget and related functions directly.
You'll have to store them in a file. This sort of file is often kept in /tmp, but any place where the user running the cron job has access would do. Make sure your script can handle the case where the file is missing.
You could create a separate process running a "remember stuff" service over your choice of IPC mechanism. This sounds like a rather tortured solution to "I don't want to touch the disk" but if it's important enough to offset a couple of days of development work (realistically, if you are new to IPC, and HP-SUX continues to live up to its name) then by all means read man perlipc for a start.
Does it have to be completely re-executed? Can you just have it running in a loop and sleeping for 15 minutes between iterations? Than you don't have to worry about saving the values externally, the program never stops.
I definitely think IPC is the way to go here.
I'd save off the data in a file. Then, inside the script I'd load the last results if the file exists.
Use module Storable to serialize Perl data structures, save them anywhere you want and deserialize them during next script execution.

perl File::Tail syncronization

im having this situation:
Im parsing some log files with perl daemon. This daemon writes data to mysql db.
Log file can:
be rotated ('solved by filesize and some logic')
doesnt exist ('ignore_nonexistant' parameter in Tail)
Daemon:
Can be killed
Can became dead by some reazon.
Im using File::Tail to tail tha file. For file rotation mechanism of date of creation or filesize can help. and what mechanism should i use to start tail from some position in file? (asume that there is a lot of such daemons, no write access to filesystem).
I've think about position variable in DB, but this wont help me.
Maybe some mechanism to pass position parameter to parrent process?
I just dont want to reinvent bicycle.
File::Tail already detects rotation and continues reading from the new file.
To deal with the daemon dying and restarting, can you query the database for the last record written when the daemon restarts, and just skip logfile lines until you get to a later line?
Try http://search.cpan.org/dist/Log-Unrotate/.
You'll have to implement your own Log::Unrotate::Cursor class if you wish to store position files in DB instead of local filesystem, but that should be trivial.
We wrote and used Log::Unrotate for 5 years in production and it tries really hard to never skip any data. (It tries so hard that it throws exception if your cursor becomes invalid, for example if log got rotated several times while reader didn't work for some reason. You may want to enable autofix_cursor option to change this behavior).
Also take a look at http://search.cpan.org/dist/File-LogReader/. I never used it but it's supposed to solve the same task.

Perl MozRepl cleanup problem

I'm coding a web crawler and I've been using WWW::Mechanize::Firefox to navigate some pages (for the others I use WWW::Mechanize) which keep loading content after the page loaded and I've never had an issue with that.
Yesterday I added DBI and DBD::mysql to the script, adding queries to export data to a database (this works perfectly), but suddenly MozRepl started giving this error:
(in cleanup) Can't call method "execute" on an undefined value at /Library/Perl/5.10.0/MozRepl.pm line 372 during global destruction.
(in cleanup) Can't call method "execute" on an undefined value at /Library/Perl/5.10.0/MozRepl.pm line 372 during global destruction.
and terminating the script after 1 cycle (it should run until it gets to the end of a specific text file, which it doesn't).
I haven't touched anything from this part of the script (don't need to use the db with those pages), at least willingly. I checked with a file compare app and couldn't solve anything.
Posting the code could be tricky, it's pretty long and I have no idea where the problem may lie.
EDIT
Sometimes it also gives this error instead of the previous one:
(in cleanup) Can't call method "cmd" on an undefined value at /Library/Perl/5.10.0/MozRepl/Client.pm line 186 during global destruction.
This has nothing to do with DBI or DBD::mysql.The messages are nothing to worry about, but I admit they are unsightly.
The messages come as remaining Perl/Javascript objects get destroyed in an unordered way during Perl Global Destruction. If you want to avoid them, destroy your $mech object before
quitting your application.
undef $mech;
# end of program
If the $mech object is released before the program gets shut down, the Perl/Javascript bridge can also shut down in an orderly fashion.
Also note that the preferred forum for questions about WWW::Mechanize::Firefox is http://perlmonks.org :)

Will inserting the same `<script>` into the DOM twice cause a second request in any browsers?

I've been working on a bit of JavaScript code that, under certain conditions, lazy-loads a couple of different libraries (Clicky Web Analytics and the Sizzle selector engine).
This script is downloaded millions of times per day, so performance optimization is a major concern. To date, I've employed a couple of flags like script_loading and script_loaded to try to ensure that I don't load either library more than once (by "load," I mean requesting the scripts after page load by inserting a <script> element into the DOM).
My question is: Rather than rely on these flags, which have gotten a little unwieldy and hard to follow in my code (think callbacks and all of the pitfalls of asynchronous code), is it cross-browser safe (i.e., back to IE 6) and not detrimental to performance to just call a simple function to insert a <script> element whenever I reach a code branch that needs one of these libraries?
The latter would still ensure that I only load either library when I need it, and would also simplify and reduce the weight of my code base, but I need to be absolutely sure that this won't result in additional, unnecessary browser requests.
My hunch is that appending a <script> element multiple times won't be harmful, as I assume browsers should recognize a duplicate src URL and rely on a local cached copy. But, you know what happens when we assume...
I'm hoping that someone is familiar enough with the behavior of various modern (and not-so-modern, such as IE 6) browsers to be able to speak to what will happen in this case.
In the meantime, I'll write a test to try to answer this first-hand. My hesitation is just that this may be difficult and cumbersome to verify with certainty in every browser that my script is expected to support.
Thanks in advance for any help and/or input!
Got an alternative solution.
At the point where you insert the new script element in the DOM, could you not do a quick scan of existing script elements to see if there is another one with the same src? If there is, don't insert another?
Javascript code on the same page can't run multithreaded, so you won't get any race conditions in the middle of this or anything.
Otherwise you are just relying on the caching behaviour of current browsers (and HTTP proxies).
The page is processed as a stream. If you load the same script multiple times, it will be run every time it is included. Obviously, due to the browser cache, it will be requested from the server only once.
I would stay away from this approach of inserting script tags for the same script multiple times.
The way I solve this problem is to have a "test" function for every script to see if it is loaded. E.g. for sizzle this would be "function() { return !!window['Sizzle']; }". The script tag is only inserted if the test function returns false.
Each time you add a script to your page,even if it has the same src the browser may found it on the local cache or ask the server if the content is changed.
Using a variable to check if the script is included is a good way to reduce loading and it's very simple:
for example this may works for you:
var LOADED_JS=Object();
function js_isIncluded(name){//returns true if the js is already loaded
return LOADED_JS[name]!==undefined;
}
function include_js(name){
if(!js_isIncluded(name)){
YOUR_LAZY_LOADING_FUNCTION(name);
LOADED_JS[name]=true;
}
}
you can also get all script elements and check the src,my solution is better because it hase the speed and simplicity of an hash array and the script src has an absolute path even if you set it with a relative path.
you may also want to init the array with the scripts normally loaded(without lazy loading)on the page init to avoid double request.
For what it's worth, if you define the scripts as type="module", they will only be loaded and executed once.