Is my Rose::DB::Object compile-time too slow?

Is my Rose::DB::Object compile-time too slow? - perl

I'm planning to move from Class::DBI to Rose::DB::Object due to its nice structure and the jargon that RDBO is faster compares to CDBI and DBIC.
However on my machine (linux 2.6.9-89, perl 5.8.9) RDBO compiled time is much slower than CDBI:
$ time perl -MClass::DBI -e0
real 0m0.233s
user 0m0.208s
sys 0m0.024s
$ time perl -MRose::DB::Object -e0
real 0m1.178s
user 0m1.097s
sys 0m0.078s
That's a lot different...
Anyone experiences similar behaviour here?
Cheers.
#manni and #john: thanks for the explanation about the modules referenced by RDBO, it surely answers why the compile-time is slower than CDBI.
The application is not running on a persistent environment. In fact it's invoked by several simultaneous cron jobs that run at 2 mins, 5 mins, and x mins interval - so yes, compile-time is crucial here...
Jonathan Rockway's App::Persistent seems interesting, however its (current) limitation to allow only one application running at a time is not suitable for my purpose. Also it has issue when we kill the client, the server process is still running...

Rose::DB::Object simply contains (or references from other modules) much more code than Class::DBI. On the bright side, it also has many more features and is much faster at runtime than Class::DBI. If compile time is concern for you, then your best bet is to load as little code as possible (or get faster disks).
Another option is to set auto_load_related_classes to false in your Metadata objects. To do this early enough and globally will probably require you to make a Metadata subclass and then set that as the meta_class in your common Rose::DB::Object base class.
Turning auto_load_related_classes off means that you'd have to manually load related classes that you actually want to use in your script. That's a bit of a pain, but it lets you control how many classes get loaded. (If you have heavily interrelated classes, loading a single one can end up pulling all the other ones in.)
You could, perhaps, have an environment variable to control the behavior. Example metadata class:
package My::DB::Object::Metadata;
use base 'Rose::DB::Object::Metadata';
# New class method to handle default
sub default_auto_load_related_classes
{
return $ENV{'RDBO_AUTO_LOAD_RELATED_CLASSES'} ? 1 : 0
}
# Override existing object method, honoring new class-defined default
sub auto_load_related_classes
{
my($self) = shift;
return $self->SUPER::auto_load_related_classes(#_) if(#_);
if(defined(my $value = $self->SUPER::auto_load_related_classes))
{
return $value;
}
# Initialize to default
return $self->SUPER::auto_load_related_classes(ref($self)->default_auto_load_related_classes);
}
And here's how it's tied to your common object base class:
package My::DB::Object;
use base 'Rose::DB::Object';
use My::DB::Object::Metadata;
sub meta_class { 'My::DB::Object::Metadata' }
Then set RDBO_AUTO_LOAD_RELATED_CLASSES to true when you're running in a persistent environment, and leave it false (and don't forget to explicitly load related classes) for command-line scripts.
Again, this will only help if you're currently loading more classes than you strictly need in a particular script due to the default true value of the auto_load_related_classes Metadata attribute.

If compile time is an issue, there are methods to lessen the impact. One is PPerl which makes a normal Perl script into a daemon that is compiled once. The only change you need to make (after installing it, of course) is to the shebang line:
#!/usr/bin/pperl
Another option is to code write a client/server model program where the bulk of the work is done by a server that loads the expensive modules and a thin script that just interacts with the server over sockets or pipes.
You should also look at App::Persistent and this article, both of which were written by Jonathan Rockway (aka jrockway).

This looks almost as dramatic over here:
time perl -MClass::DBI -e0
real 0m0.084s
user 0m0.080s
sys 0m0.004s
time perl -MRose::DB::Object -e0
real 0m0.391s
user 0m0.356s
sys 0m0.036s
I'm afraid part of the difference can simply be explained by the number of dependencies in each module:
perl -MClass::DBI -le 'print scalar keys %INC'
46
perl -MRose::DB::Object -le 'print scalar keys %INC'
95
Of course, you should ask yourself how much compilation time really matters for your particular problem. And what source code would be easier to maintain for you.

Related

VHDL Bus Functional Modelling - Can't put groups of procedures into a package to clean up the code

I want to organize a working bus functional model and push commonly used procedures (which look like CPU subroutines) out into a package and get them out of the main cpu model, but I'm stuck.
The procedures don't have access to the hardware bits when they're pushed out in a package.
In Verilog, I would put commonly used procedures out into an include file and link them into the CPU model as required for a given test suite.
More details:
I have a working bus functional model of a CPU, for simulation test benching.
At the "user interface" level I have a process called "main" running inside the CPU model which calls my predefined "instruction set" like this:
cpu_read(address, read_result);
cpu_write(address, write_data);
etc.
I bundle groups of those calls up into higher level procedures like
configure_communication_bus;
clear_all_packet_counters;
etc.
At the next layer these generic functions call a more hardware specific version which knows the interface timing for the design,
and those procedures then use an input record and output record to connect to the hardware module ports and waggle the cpu bus signals as required.
cpu_read calls hardware_cpu_read(cpu_input_record, cpu_output_record, address);
Something like this:
procedure cpu_read (address : in std_logic_vector(15 downto 0);
read_result : out std_logic_vector(31 downto 0));
begin
hardware_cpu_read(cpu_input_record, cpu_output_record, address, read_result);
end procedure;
The cpu_input_record and cpu_output_record are declared as signals of type nnn_record in the cpu model vhdl file.
So this is all working, but every single one of these procedures is all stored in the cpu VHDL module file, and all in the procedure declaration section so that they are all in the same scope.
If I share the model with team members they will need to add their own testing subroutines, and those also are all in the same location in the file, as well, their simulation test code has to go into the "main" process along with mine.
I'd rather link in various tests from outside the model, and only keep model specific procedures in the model file..
Ironically I can push the lowest level hardware procedure out to a package, and call those procedures from within the "main" process, but the higher level processes can't be put out into that package or any other packages because they don't have access to the cpu_read_record and cpu_write_record.
I feel like there must be a simple way to clean up this code and make it modular, and I'm just missing something obvious.
I don't really think making a command interpreter and loading my test code into a behavioral ROM is the right way to go by the way. Nor is fighting with the simulator interface to connect up a C program, but I may break down and try this..

Quick sketch of an answer (to the question I think you are asking! :-) though I may be off-beam...
To move the BFM subprograms into a reusable package, they need to be independent of the execution scope - that usually means a long parameter list for each of them. So using them in a testbench quickly gets tedious compared with the parameterless (or parameter-lite) versions you have now..
The usual workaround is to implement the BFM in a package, with long parameter lists.
Then write parameter-lite local equivalents (wrappers) in the execution scope, which simply call the package versions supplying all the parameters explicitly.
This is just boilerplate - not pretty but it does allow you to move the BFM into a package. These wrappers can be local to the testbench, to a process within it, or even to a subprogram within that process.
(The parameter types can be records for tidiness : these are probably declared in a third package, shared between BFM. TB, and synthesisable device under test...)
Thanks to overloading, there is no ambiguity between the local and BFM package versions, so the actual testbench remains as simple as possible.
Example wrapper function :
function cpu_read(address : unsigned) return slv_32 is
begin
return BFM_pack.cpu_read (
address => address,
rd_data_bus => tb_rd_data_bus,
wait => tb_wait_signal,
oe => tb_mem_oe,
-- ditto for all the signals constants variables it needs from the tb_ scope
);
end cpu_read;

Currently your test procedures require two extra signals on them, cpu_input_record and cpu_output_record. This is not so bad. It is not uncommon to just have these on all procedures that interact with the cpu and be done with it. So use hardware_cpu_read and not cpu_read. Add cpu_input_record, cpu_output_record to your configure_communication_bus and clear_all_packet_counters procedures and be done. Perhaps choose shorter names.
I do a similar approach, except I use only one record with resolved elements. To make this work, you need to initialize the record so that all elements are non-driving (ie: 'Z' for std_logic). To make this more flexible, I have created resolution functions for integer, time, and real. However, this only saves you one signal. Not a real huge win. Perhaps half way to where you think you want to be. But it is more work than what you are doing.
For VHDL-201X, we are working on syntax to allow parameters/ports automatically map to a identically named signal. This will get you to where you want to be with any of the approaches (yours, mine, or Brian's without the extra wrapper subprogram). It is posted here: http://www.eda.org/twiki/bin/view.cgi/P1076/ImplicitConnections. Given this, I would add the two records to your procedures and call it good enough for now.
Once you get by this problem, you seem to also be asking is how do I write separate tests using the same testbench. For this I use multiple architectures - I like to think of these as a Factory Class for concurrent code. To make this feasible, I separate the stimulus generation code from the rest of the testbench (typically: netlist connections and clock). My presentation, "VHDL Testbench Techniques that Leapfrog SystemVerilog", has an overview of this architecture along with a number of other goodies. It is available at: http://www.synthworks.com/papers/index.htm

You're definitely on the right track, in fact I have a variant like this (what you describe).
The catch is, now I build up a whole subroutine using the "parameter light" procedures, and those are what I want to put in a package to share and reuse. The problem is that any procedure pushed out to a package can't call to the parameter light procedures in the main vhdl file..
So what happens is we have one main vhdl file with all the common CPU hardware setup routines, and every designer's test code all in the same vhdl file..
Long story short, putting our test subroutines into separate files is really what I was hoping for..

Which is better in PHP: suppress warnings with '#' or run extra checks with isset()?

For example, if I implement some simple object caching, which method is faster?
1. return isset($cache[$cls]) ? $cache[$cls] : $cache[$cls] = new $cls;
2. return #$cache[$cls] ?: $cache[$cls] = new $cls;
I read somewhere # takes significant time to execute (and I wonder why), especially when warnings/notices are actually being issued and suppressed. isset() on the other hand means an extra hash lookup. So which is better and why?
I do want to keep E_NOTICE on globally, both on dev and production servers.

I wouldn't worry about which method is FASTER. That is a micro-optimization. I would worry more about which is more readable code and better coding practice.
I would certainly prefer your first option over the second, as your intent is much clearer. Also, best to keep away edge condition problems by always explicitly testing variables to make sure you are getting what you are expecting to get. For example, what if the class stored in $cache[$cls] is not of type $cls?
Personally, if I typically would not expect the index on $cache to be unset, then I would also put error handling in there rather than using ternary operations. If I could reasonably expect that that index would be unset on a regular basis, then I would make class $cls behave as a singleton and have your code be something like
return $cls::get_instance();

The isset() approach is better. It is code that explicitly states the index may be undefined. Suppressing the error is sloppy coding.
According to this article 10 Performance Tips to Speed Up PHP, warnings take additional execution time and also claims the # operator is "expensive."
Cleaning up warnings and errors beforehand can also keep you from
using # error suppression, which is expensive.
Additionally, the # will not suppress the errors with respect to custom error handlers:
http://www.php.net/manual/en/language.operators.errorcontrol.php
If you have set a custom error handler function with
set_error_handler() then it will still get called, but this custom
error handler can (and should) call error_reporting() which will
return 0 when the call that triggered the error was preceded by an #.
If the track_errors feature is enabled, any error message generated by
the expression will be saved in the variable $php_errormsg. This
variable will be overwritten on each error, so check early if you want
to use it.

# temporarily changes the error_reporting state, that's why it is said to take time.
If you expect a certain value, the first thing to do to validate it, is to check that it is defined. If you have notices, it's probably because you're missing something. Using isset() is, in my opinion, a good practice.

I ran timing tests for both cases, using hash keys of various lengths, also using various hit/miss ratios for the hash table, plus with and without E_NOTICE.
The results were: with error_reporting(E_ALL) the isset() variant was faster than the # by some 20-30%. Platform used: command line PHP 5.4.7 on OS X 10.8.
However, with error_reporting(E_ALL & ~E_NOTICE) the difference was within 1-2% for short hash keys, and up 10% for longer ones (16 chars).
Note that the first variant executes 2 hash table lookups, whereas the variant with # does only one lookup.
Thus, # is inferior in all scenarios and I wonder if there are any plans to optimize it.

I think you have your priorities a little mixed up here.
First of all, if you want to get a real world test of which is faster - load test them. As stated though suppressing will probably be slower.
The problem here is if you have performance issues with regular code, you should be upgrading your hardware, or optimize the grand logic of your code rather than preventing proper execution and error checking.
Suppressing errors to steal the tiniest fraction of a speed gain won't do you any favours in the long run. Especially if you think that this error may keep happening time and time again, and cause your app to run more slowly than if the error was caught and fixed.

Why do I need to know how many tests I will be running with Test::More?

Am I a bad person if I use use Test::More qw(no_plan)?
The Test::More POD says
Before anything else, you need a testing plan. This basically declares how many tests your script is going to run to protect against premature failure...
use Test::More tests => 23;
There are rare cases when you will not know beforehand how many tests your script is going to run. In this case, you can declare that you have no plan. (Try to avoid using this as it weakens your test.)
use Test::More qw(no_plan);
But premature failure can be easily seen when there are no results printed at the end of a test run. It just doesn't seem that helpful.
So I have 3 questions:
What is the reasoning behind requiring a test plan by default?
Has anyone found this a useful and time saving feature in the long run?
Do other test suites for other languages support this kind of thing?

What is the reason for requiring a test plan by default?
ysth's answer links to a great discussion of this issue which includes comments by Michael Schwern and Ovid who are the Test::More and Test::Most maintainers respectively. Apparently this comes up every once in a while on the perl-qa list and is a bit of a contentious issue. Here are the highlights:
Reasons to not use a test plan
Its annoying and takes time.
Its not worth the time because test scripts won't die without the test harness noticing except in some rare cases.
Test::More can count tests as they happen
If you use a test plan and need to skip tests, then you have the additional pain of needing a SKIP{} block.
Reasons to use a test plan
It only takes a few seconds to do. If it takes longer, your test logic is too complex.
If there is an exit(0) in the code somewhere, your test will complete successfully without running the remaining test cases. An observant human may notice the screen output doesn't look right, but in an automated test suite it could go unnoticed.
A developer might accidentally write test logic so that some tests never run.
You can't really have a progress bar without knowing ahead of time how many tests will be run. This is difficult to do through introspection alone.
The alternative
Test::Simple, Test::More, and Test::Most have a done_testing() method which should be called at the end of the test script. This is the approach I take currently.
This fixes the problem where code has an exit(0) in it. It doesn't fix the problem of logic which unintentionally skips tests though.
In short, its safer to use a plan, but the chances of this actually saving the day are low unless your test suites are complicated (and they should not be complicated).
So using done_testing() is a middle ground. Its probably not a huge deal whatever your preference.
Has this feature been useful to anyone in the real world?
A few people mention that this feature has been useful to them in the real word. This includes Larry Wall. Michael Schwern says the feature originates with Larry, more than 20 years ago.
Do other languages have this feature?
None of the xUnit type testing suites has the test plan feature. I haven't come across any examples of this feature being used in any other programming language.

I'm not sure what you are really asking because the documentation extract seems to answer it. I want to know if all my tests ran. However, I don't find that useful until the test suite stabilizes.
While developing, I use no_plan because I'm constantly adding to the test suite. As things stabilize, I verify the number of tests that should run and update the plan. Some people mention the "test harness" catching that already, but there is no such thing as "the test harness". There's the one that most modules use by default because that's what MakeMaker or Module::Build specify, but the TAP output is independent of any particular TAP consumer.
A couple of people have mentioned situations where the number of tests might vary. I figure out the tests however I need to compute the number then use that in the plan. It also helps to have small test files that target very specific functionality so the number of tests is low.
use vars qw( $tests );
BEGIN {
$tests = ...; # figure it out
use Test::More tests => $tests;
}
You can also separate the count from the loading:
use Test::More;
plan tests => $tests;
The latest TAP lets you put the plan at the end too.

In one comment, you seem to think prematurely exiting will count as a failure, since the plan won't be output at the end, but this isn't the case - the plan will be output unless
you terminate with POSIX::_exit or a fatal signal or the like. In particular, die() and exit() will result
in the plan being output (though the test harness should detect anything other than an exit(0) as a prematurely terminated test).
You may want to look at Test::Most's deferred plan option, soon to be in Test::More (if it's not already).
There's also been discussion of this on the perl-qa list recently. One thread: http://www.nntp.perl.org/group/perl.qa/2009/03/msg12121.html

Doing any testing is better than doing no testing, but testing is about being deliberate. Stating the number tests expected gives you the ability to see if there is a bug in the test script that is preventing a test from executing (or executing too many times). If you don't run tests under specific conditions you can use the skip function to declare this:
SKIP: {
skip $why, $how_many if $condition;
...normal testing code goes here...
}

I think it's ok to bend the rules and use no_plan when the human cost of figuring out the plan is too high, but this cost is a good indication that the test suite has not been well designed.
Another case where it's useful to have the test_plan explicitely defined is when you are doing this kind of tests:
$coderef = sub { my $arg = shift; isa_ok $arg, 'MyClass' };
do(#args, $coderef);
and
## hijack our interface to test it's called.
local *MyClass::do = $coderef;
If you don't specify a plan, it's easy to miss out that your test failed and that some assertions weren't run as you expected.

Having explicitly the number of test in the plan is a good idea, unless it is too expensive to retrieve this number. The question has been properly answered already but I wanted to stress two points:
Better than no_plan is to use done_testing()
use Test::More;
... run your tests ...;
done_testing( $number_of_tests_run );
# or done_testing() if not number of test is known
this Matt Trout blog entry is interesting, and rants about adding a plan vs cvs conflicts and other issues that make the plan problematic: Why numeric test plans are bad, wrong, and don't actually help anyway

I find it annoying, too, and I usually ignore the number at the very beginning until the test suite stabilizes. Then I just keep it up to date manually. I do like the idea of knowing how many total tests there are as the seconds tick by, as a kind of a progress indicator.
To make counting easier I put the following before each test:
#----- load non-existant record -----
....
#----- add a new record -----
....
#----- load the new record (by name) -----
....
#----- verify the name -----
etc.
Then I can quickly scan the file and easily count the tests, just looking for the #----- lines. I suppose I could even write something up in Emacs to do it for me, but it's honestly not that much of a chore.

It is a pain when doing TDD, because you are writing new tests opportunistically. When I was teaching TDD and the shop used Perl, we decided to use our test suite the no plan way. I guess we could have changed from no_plan to lock down the number of tests. At the time I saw it as more hindrance than help.

Eric Johnson's answer is exactly correct. I just wanted to add that done_testing, a much better replacement to no_plan, was released in Test-Simple 0.87_1 recently. It's an experimental release, but you can download it directly from the previous link.
done_testing allows you to declare the number of tests you think you've run at the end of your testing script, rather than trying to guess it before your script starts. You can read the documentation here.

Intra-process coordination in mod_perl under the worker MPM

I need to do some simple timezone calculation in mod_perl. DateTime isn't an option. What I need to do is easily accomplished by setting $ENV{TZ} and using localtime and POSIX::mktime, but under a threaded MPM, I'd need to make sure only one thread at a time was mucking with the environment. (I'm not concerned about other uses of localtime, etc.)
How can I use a mutex or other locking strategy to serialize (in the non-marshalling sense) access to the environment? The docs I've looked at don't explain well enough how I would create a mutex for just this use. Maybe there's something I'm just not getting about how you create mutexes in general.
Update: yes, I am aware of the need for using Env::C to set TZ.

(repeating what I said over at PerlMonks...)
BEGIN {
my $mutex;
sub that {
$mutex ||= APR::ThreadMutex->new( $r->pool() );
$mutex->lock();
$ENV{TZ}= ...;
...
$mutex->unlock();
}
}
But, of course, lock() should happen in a c'tor and unlock() should happen in a d'tor except for one-off hacks.
Update: Note that there is a race condition in how $mutex is initialized in the subroutine (two threads could call that() for the first time nearly simultaneously). You'd most likely want to initialize $mutex before (additional) threads are created but I'm unclear on the details on the 'worker' Apache MPM and how you would accomplish that easily. If there is some code that gets run "early", simply calling that() from there would eliminate the race.
Which all suggests a much safer interface to APR::ThreadMutex:
BEGIN {
my $mutex;
sub that {
my $autoLock= APR::ThreadMutex->autoLock( \$mutex );
...
# Mutex automatically released when $autoLock destroyed
}
}
Note that autoLock() getting a reference to undef would cause it to use a mutex to prevent a race when it initializes $mutex.

Because of this issue, mod_perl 2 actually deals with the %ENV hash differently than mod_perl 1. In mod_perl 1 %ENV was tied directly to the environ struct, so changing %ENV changed the environment. In mod_perl 2, the %ENV hash is populated from environ, but changes are not passed back.
This means you can no longer muck with $ENV{TZ} to adjust the timezone -- particularly in a threaded environment. The Apache2::Localtime module will make it work for the non-threaded case (by using Env::C) but when running in a threaded MPM that will be bad news.
There are some comments in the mod_perl source (src/modules/perl/modperl_env.c) regarding this issue:
/* * XXX: what we do here might change:
* - make it optional for %ENV to be tied to r->subprocess_env
* - make it possible to modify environ
* - we could allow modification of environ if mpm isn't threaded
* - we could allow modification of environ if variable isn't a CGI
* variable (still could cause problems)
*/
/*
* problems we are trying to solve:
* - environ is shared between threads
* + Perl does not serialize access to environ
* + even if it did, CGI variables cannot be shared between threads!
* problems we create by trying to solve above problems:
* - a forked process will not inherit the current %ENV
* - C libraries might rely on environ, e.g. DBD::Oracle
*/

If you're using apache 1.3, then you shouldn't need to resort to mutexes. Apache 1.3 spawns of a number of worker processes, and each worker executes a single thread. In this case, you can write:
{
local $ENV{TZ} = whatever_I_need_it_to_be();
# Do calculations here.
}
Changing the variable with local means that it reverts back to the previous value at the end of the block, but is still passed into any subroutine calls made from within that block. It's almost certainly what you want. Since each process has its own independent environment, you won't be changing the environment of other processes using this technique.
For apache 2, I don't know what model it uses with regards to forks and threads. If it keeps the same approach of forking off processes and having a single thread each, you're fine.
If apache 2 uses honest to goodness real threads, then that's outside my area of detailed knowledge, but I hope another lovely stackoverflow person can provide assistance.
All the very best,
Paul

Is there a way to have managed processes in Perl (i.e. a threads replacement that actually works)?

I have a multithreded application in perl for which I have to rely on several non-thread safe modules, so I have been using fork()ed processes with kill() signals as a message passing interface.
The problem is that the signal handlers are a bit erratic (to say the least) and often end up with processes that get killed in inapropriate states.
Is there a better way to do this?

Depending on exactly what your program needs to do, you might consider using POE, which is a Perl framework for multi-threaded applications with user-space threads. It's complex, but elegant and powerful and can help you avoid non-thread-safe modules by confining activity to a single Perl interpreter thread.
Helpful resources to get started:
Programming POE presentation by Matt Sergeant (start here to understand what it is and does)
POE project page (lots of cookbook examples)
Plus there are hundreds of pre-built POE components you can use to assemble into an application.

You can always have a pipe between parent and child to pass messages back and forth.
pipe my $reader, my $writer;
my $pid = fork();
if ( $pid == 0 ) {
close $reader;
...
}
else {
close $writer;
my $msg_from_child = <$reader>;
....
}
Not a very comfortable way of programming, but it shouldn't be 'erratic'.

Have a look at forks.pm, a "drop-in replacement for Perl threads using fork()" which makes for much more sensible memory usage (but don't use it on Win32). It will allow you to declare "shared" variables and then it automatically passes changes made to such variables between the processes (similar to how threads.pm does things).

From perl 5.8 onwards you should be looking at the core threads module. Have a look at http://metacpan.org/pod/threads
If you want to use modules which aren't thread safe you can usually load them with a require and import inside the thread entry point.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse