Why would you assign to $$? - perl

perldoc perl5150delta says:
$$ can be assigned to
$$ was made read‐only in Perl 5.8.0. But only sometimes: "local $$"
would make it writable again. Some CPAN modules were using "local $$"
or XS code to bypass the read‐only check, so there is no reason to keep
$$ read‐only. (This change also allowed a bug to be fixed while
maintaining backward compatibility.)
$$ is the current process ID, why in the world would you assign to it?

There are only a couple (literally) places in CPAN where people want to assign to $$, and it's mostly for testing (I haven't understood IPC::Messaging yet). I don't like this feature, especially since there's a much better way to get the same effect. The Perl 5 Porters added this feature because they could and they would rather not make the couple of cases do a better job of testing. If you read the p5p thread, it's obvious that this feature wasn't driven by need.
I wrote about it in Hide low-level details behind an interface.
However, I could be wrong on this because I'm not that good at the low-level black magic. I know there is a need to coordinate PIDs, but so far I think that $$ isn't the only way to do that. If someone has a use case that they can explain to me, I'll update that post.

IPC::Messaging, which provides sorta kinda Erlang-like messaging (not performance-wise, syntax-wise) does that to $$ to replace it with an object which numifies to the original pid. This is done to have a convenient reference to a "self-process" which one can call methods on (= send messages to).
Full disclosure: I am the author of the module.

If you were the syscall implemenation of a fork() like system call you would need to assign to the global one.

Related

Progress-gl - What's benefit of placing variable declaration on top of the procedure

I've been doing Progress 4GL for 8 years though it's not my main responsibility. I do C++ and Java a lot more. When programming in other language it's suggested to have the declaration close to the usage. With 4GL however I see people place the declaration on top of the file. It's even in the coding standard.
I think placing them on top of them file would lead to 'vertical separation' problem. In most other language it's even suggested to do the assignment at the same line as the declaration.
The question is why it's suggested to do so in 4GL ? What's the benefit ? I know that it's possible to place the declaration anywhere in the file, given that it's declared before it is used.
I think the answer is to do with scoping, or the lack of it, within Progress 4GL.
If you are used to Java, say, and read a Progress 4GL program, that looks like
DO:
DEFINE VARIABLE x AS INTEGER INITIAL 4.
DISPLAY x.
END.
then you wouldn't expect to be able to use this value of x anywhere else in the program, and that any changes made in the block, wouldn't effect anything outside the block.
As I understand it, all progress variables declared within the body of a program are scoped to the whole program, unless they are declared are within an internal procedure or function, in which case they are scoped to the procedure or function.
(Incidentally any default buffers [i.e. undeclared] you use within an internal procedure/function are scoped to the whole program, not just the procedure or function, so you need to be very careful to explicity declare buffers in functions you intend ot use recursively).
I therefore think the convention of declaring variables at the beginning of a program is in order to reflect the fact that Progress will treat them has having been done so, regardless of where you put the declaration.
There is absolutely no benefit in scoping anything to the program as a whole when it could be scoped smaller.
Smaller scopes are easier to test, give less possibility of namespace conflict, and less opportunity for error.
Tightly scoped named buffers are especially useful when writing to the database because they eliminate the possibility of there ever being some other part of your code that uses the same buffer and causes a share-lock, i.e., this fails to compile:
do for b-customer transaction:
find b-customer where .... exclusive...
...
end.
...
find b-customer...
On the other hand, procedures and functions (and include files...) that share scope with the main body of code are a major source of bugs, because when you pick up your variable or whatever, you can never be entirely certain where it has been...
All of this is just basic Structured Programming, of course. It's true for every language and has been accepted since the 70's.
The "reason" that you usually see variables defined at the top is simple. Habit. That is just how things were done in the bad old days.
A lot of old code, or code written by old fossils, is written that way. No matter the language.
Some languages (COBOL springs to mind) even formalized it.
Is there any advantage to such an approach?
Not especially. I guess you could argue "they are all in one place and easy to find" but that isn't very compelling.
"Habit" is actually more compelling ;) If you are working with a team that expects a certain style or in an application where a particular style is prevalent then you should think twice before unilaterally throwing out a new way of doing things - the confusion could be a bigger problem than the advantages gained.

How can I tell if a Perl module is actually used in my program?

I have been on a "cleaning spree" lately at work, doing a lot of touch-up stuff that should have been done awhile ago. One thing I have been doing is deleted modules that were imported into files and never used, or they were used at one point but not anymore. To do this I have just been deleting an import and running the program's test file. Which gets really, really tedious.
Is there any programmatic way of doing this? Short of me writing a program myself to do it.
Short answer, you can't.
Longer possibly more useful answer, you won't find a general purpose tool that will tell you with 100% certainty whether the module you're purging will actually be used. But you may be able to build a special purpose tool to help you with the manual search that you're currently doing on your codebase. Maybe try a wrapper around your test suite that removes the use statements for you and ignores any error messages except messages that say Undefined subroutine &__PACKAGE__::foo and other messages that occur when accessing missing features of any module. The wrapper could then automatically perform a dumb source scan on the codebase of the module being purged to see if the missing subroutine foo (or other feature) might be defined in the unwanted module.
You can supplement this with Devel::Cover to determine which parts of your code don't have tests so you can manually inspect those areas and maybe get insight into whether they are using code from the module you're trying to purge.
Due to the halting problem you can't statically determine whether any program, of sufficient complexity, will exit or not. This applies to your problem because the "last" instruction of your program might be the one that uses the module you're purging. And since it is impossible to determine what the last instruction is, or if it will ever be executed, it is impossible to statically determine if that module will be used. Further, in a dynamic language, which can extend the program during it's run, analysis of the source or even the post-compile symbol tables would only tell you what was calling the unwanted module just before run-time (whatever that means).
Because of this you won't find a general purpose tool that works for all programs. However, if you are positive that your code doesn't use certain run-time features of Perl you might be able to write a tool suited to your program that can determine if code from the module you're purging will actually be executed.
You might create alternative versions of the modules in question, which have only an AUTOLOAD method (and import, see comment) in it. Make this AUTOLOAD method croak on use. Put this module first into the include path.
You might refine this method by making AUTOLOAD only log the usage and then load the real module and forward the original function call. You could also have a subroutine first in #INC which creates the fake module on the fly if necessary.
Of course you need a good test coverage to detect even rare uses.
This concept is definitely not perfect, but it might work with lots of modules and simplify the testing.

How to add logging information to perl legacy code

I have a medium to large size system built in perl, that has been developed during the last 15 years and is built of many scripts and pm files,
and in order to improve the system i need more data, the easiest way as i see it to get this data is to have every function in the code to print out the start and end time to some log so it will be possible the understand what is taking the most time.
however this is an old system and some parts are less maintainable than others and on top of it i need it to be running which means in order to get real data i need it to print this out from production.
what i want to do is to override in some way the function declration to wrap each function start in a line like
NAME start STARTTIME PARAMS
and when it leaves the function
NAME ended STARTTIME PARAMS
does anybody can point me to the right direction?
Thanks
Take a look at Devel::NYTProf. It can profile the amount of time that all of your subs are taking (and do a lot more). It doesn't involve a lot of messy code modification; instead you just run your script with it:
perl -d:NYTProf your_script.pl
Previous answers are spot on (I especially recommend Devel::NYTProf). However, a more general technique you could apply in general to gather data about your subroutines' behaviour is fiddling with the symbol table, "appending" (or prepending) code to the actual sub's code.
A couple of pointers:
In Perl, can I call a method before executing every function in a package? (this answer shows a code example you could adapt to your particular situation)
Hook::LexWrap is a module that lets you augment subroutine behaviour in several ways, without touching the original code
HTH
sounds like you need to use a profiler
http://www.perl.org/about/whitepapers/perl-profiling.html
Perl profilers usually have a huge impact on the program performance, so using them in production may not be a great idea.
You can try Devel::ContinuousProfiler that claims to have very low impact (I myself have never used it, just discovered it this morning!)

Is it a bad to use $$ in Prototype?

The Prototype JS API documentation mentions the $$() function, which allows you to select and extend elements based on CSS selectors, like the $() function in jQuery does.
However, on that page, $$ is presented like some sort of last resort:
Sometimes the usual tools from your DOM arsenal just aren't enough to quickly find elements or collections of elements. If you know the DOM tree structure, you can simply resort to CSS selectors to get the job done.
Why is that? Should I stay away from $$ and just use document.getElementsByClassName (ugh) instead?
Based on that quote you wrote, I'd say they are encouraging you to use $$(). $$() offers you a cross-browser way to access elements quickly and easily. On the other hand, document.getElementsByClassName() is either buggy or not functional in IE version up to and including version 8.
In a complicated project, I try to stay away from using $$() so that I don't accidentally select something I don't want. For a smaller project, I wouldn't worry. I can usually accomplish what I need to with $(Element).childElements or $(Element).immediateDecendants instead.

How can I represent sets in Perl?

I would like to represent a set in Perl. What I usually do is using a hash with some dummy value, e.g.:
my %hash=();
$hash{"element1"}=1;
$hash{"element5"}=1;
Then use if (defined $hash{$element_name}) to decide whether an element is in the set.
Is this a common practice? Any suggestions on improving this?
Also, should I use defined or exists?
Thank you
Yes, building hash sets that way is a common idiom. Note that:
my #keys = qw/a b c d/;
my %hash;
#hash{#keys} = ();
is preferable to using 1 as the value because undef takes up significantly less space. This also forces you to uses exists (which is the right choice anyway).
Use one of the many Set modules on CPAN. Judging from your example, Set::Light or Set::Scalar seem appropriate.
I can defend this advice with the usual arguments pro CPAN (disregarding possible synergy effects).
How can we know that look-up is all that is needed, both now and in the future? Experience teaches that even the simplest programs expand and sprawl. Using a module would anticipate that.
An API is much nicer for maintenance, or people who need to read and understand the code in general, than an ad-hoc implementation as it allows to think about partial problems at different levels of abstraction.
Related to that, if it turns out that the overhead is undesirable, it is easy to go from a module to a simple by removing indirections or paring data structures and source code. But on the other hand, if one would need more features, it is moderately more difficult to achieve the other way around.
CPAN modules are already tested and to some extent thoroughly debugged, perhaps also the API underwent improvement steps over the time, whereas with ad-hoc, programmers usually implement the first design that comes to mind.
Rarely it turns out that picking a module at the beginning is the wrong choice.
That's how I've always done it. I would tend to use exists rather than defined but they should both work in this context.