perl IO eventhandler for untainting strings - perl

How do I create an event handler in my Perl code to intercept all File/Directory/system-based calls, so that I can untaint input in a just-in-time fashion.
I have lots of IO access in my script, and I find adding manual code for untainting cumbersome.
Can this be done without need to install a third-party CPAN module?

You could try taking an aspect-oriented approach but it does require installing a CPAN module, Aspect.
To capture calls to a particular method / function, you define a pointcut (taken from the Aspect POD):
$pointcut = call qr/^Person::[gs]et_/; # defines a collection of events
Then define the code to take before the call:
$before = before {
print "g/set will soon be called";
} $pointcut;
Although I'm not sure if the Aspect module allows you to trap calls to the CORE::* namespace.

How do expect to untaint general data? If you're just going to blindly accept everything despite its source, there's no point in using taint-checking.
You might want to read the "Secure Programming Techniques" chapter in Mastering Perl. I give quite a bit of advice for dealing with this sort of stuff. However, any good advice is going to be targeted at specific situations, not generalizations.

Related

How can I tell if a Perl module is actually used in my program?

I have been on a "cleaning spree" lately at work, doing a lot of touch-up stuff that should have been done awhile ago. One thing I have been doing is deleted modules that were imported into files and never used, or they were used at one point but not anymore. To do this I have just been deleting an import and running the program's test file. Which gets really, really tedious.
Is there any programmatic way of doing this? Short of me writing a program myself to do it.
Short answer, you can't.
Longer possibly more useful answer, you won't find a general purpose tool that will tell you with 100% certainty whether the module you're purging will actually be used. But you may be able to build a special purpose tool to help you with the manual search that you're currently doing on your codebase. Maybe try a wrapper around your test suite that removes the use statements for you and ignores any error messages except messages that say Undefined subroutine &__PACKAGE__::foo and other messages that occur when accessing missing features of any module. The wrapper could then automatically perform a dumb source scan on the codebase of the module being purged to see if the missing subroutine foo (or other feature) might be defined in the unwanted module.
You can supplement this with Devel::Cover to determine which parts of your code don't have tests so you can manually inspect those areas and maybe get insight into whether they are using code from the module you're trying to purge.
Due to the halting problem you can't statically determine whether any program, of sufficient complexity, will exit or not. This applies to your problem because the "last" instruction of your program might be the one that uses the module you're purging. And since it is impossible to determine what the last instruction is, or if it will ever be executed, it is impossible to statically determine if that module will be used. Further, in a dynamic language, which can extend the program during it's run, analysis of the source or even the post-compile symbol tables would only tell you what was calling the unwanted module just before run-time (whatever that means).
Because of this you won't find a general purpose tool that works for all programs. However, if you are positive that your code doesn't use certain run-time features of Perl you might be able to write a tool suited to your program that can determine if code from the module you're purging will actually be executed.
You might create alternative versions of the modules in question, which have only an AUTOLOAD method (and import, see comment) in it. Make this AUTOLOAD method croak on use. Put this module first into the include path.
You might refine this method by making AUTOLOAD only log the usage and then load the real module and forward the original function call. You could also have a subroutine first in #INC which creates the fake module on the fly if necessary.
Of course you need a good test coverage to detect even rare uses.
This concept is definitely not perfect, but it might work with lots of modules and simplify the testing.

function overloading by return type

Perl is one of such language which supports the function overloading by return type.
Simple example of this is wantarray().
Few nice modules are available in CPAN which extends this wantarray() and provide overloading for many other return types. These modules are Contextual::Return and Want. Unfortunately, I can not use these modules as both of these fails the perl critic with perl version 5.8.9 (I can not upgrade this perl version).
So, I am thinking to write my own module like Contextual::Return and Want but with very minimal. I tried to understand Contextual::Return and Want modules code, but I am not an expert.
I need function overloading for return types BOOL, OBJREF, LIST, SCALAR only.
Please help me by providing some guidelines, how can I start for it.
Modules that play with Perl's syntax in the way that Contextual::Return and Want do are pretty much bound to fall foul of Perl::Critic. In this case the main transgressions are occasionally disabling strict and using subroutine prototypes, which are minimal.
I personally believe it is a foolish rule that insists that all code must pass an arbitrary set of tests with no exceptions, but I also think that any code that behaves very differently depending on the context in which it is called is likely to be badly designed and difficult to understand and maintain. It is rare to see even wantarray used, as Perl generally does the right thing without you having to explain.
I think you may have come across a module that looks interesting to use, and want to incorporate it into your code somehow. Can you change my mind by showing an example of a subroutine that would require the comprehensive context checking that you describe?

What exactly is a source filter?

Whenever I see the term source filter I am left wondering as to what it refers to.
Aside from a formal definition, I think an example would also be helpful to drive the message home.
A source filter is a module that modifies some other code before it is evaluated. Therefore the code that is executed is not what the programmer sees when it is written. You can read more about source filters (in the Perl context) at perldoc perlfilter. Some examples are Smart::Comments which allows the programmer to leave debugging commands in comments in the code and employ them only if desired, another is PDL::NiceSlice which is sugar for slicing PDL objects.
Edit:
For more information on usage (should you wish to brave the beast), read the documentation for Filter::Simple which is a typical way to create filters.
Alternatively there is a new and different way to muck about with the source: Devel::Declare lets you interact with Perl's own parser, letting you do many of the same type of thing as a source filter, but without the source filter. This can be "safer" in some respect, yet it has a more limited scope.
A source filter is a form of module which affects the way in which a file use-ing it will be parsed. They are commonly used to simulate syntactical features which Perl does not have natively -- for instance, the Switch source filter was often used to simulate switch statements before Perl's given { } construction was available.
Source filters work by taking the text of the module as input, performing some processing on it, and outputting the filtered source code. For a simple example of how a source filter is implemented, as well as more details, see the perldoc page for perlfilter.
They are pre-processors. They change the source code before it reaches the Perl compiler. You can do scary things with them, in effect implementing your own language, with all the effects this has on readability (for others), robustness (writing parsers is hard) and maintainability (debugging gets tricky when your idea of what the source code is differs from what compiler and runtime think it is).

Signal/Slot mechanism for Perl

I'm wondering if there is an equivalent to Qt's signal/slot mechanism for Perl. I have looked into POE, but since it's huge, I couldn't find anything useful.
Thank you in advance,
Perhaps you are looking for something like Object::Event, an API for registering and emitting events, mostly for AnyEvent, but I imagine you could use it elsewhere. Gtk2 also has a mechanism similar to QT's, especially combined with Glade XML, which lets you automatically map event slots|signals to perl object methods or functions. AnyEvent is a generic event loop which supports Gtk/Glib and POE, amongst others, and is much easier to grok than the large set of modules that is POE.
The concept is generally called Publish/Subscribe. The search result for pubsub on CPAN gives you what you want.

In Perl, is it better to use a module than to require a file?

Another question got me thinking about different methods of code reuse: use vs. require vs. do
I see a lot of posts here where the question centers around the use of require to load and execute code. This seems to me to be an obvious bad practice, but I haven't found any good resources on the issue that I can point people to.
perlfaq8 covers the difference between use and require, but it doesn't offer any advice in regard to preference (as of 5.10--in 5.8.8 there is a quick bit of advice in favor of use).
This topic seems to suffer from a lack of discussion. I have a few questions I'd like to see discussed:
What is the preferred method of code reuse in Perl?
use ModuleName;
require ModuleName;
require 'file.pl';
do 'file.pl';
What is the difference between require ModuleName and require "file.pl"?
Is it ever a good idea to use require "file.pl"? Why or why not?
Standard practice is to use use most of the time, require occasionally, and do rarely.
do 'file' will execute file as a Perl script. It's almost like calling eval on the contents of the file; if you do the same file multiple times (e.g. in a loop) it will be parsed and evaluated each time which is unlikely to be what you want. The difference between do and eval is that do can't see lexical variables in the enclosing scope, which makes it safer. do is occasionally useful for simple tasks like processing a configuration file that's written in the form of Perl code.
require 'file' is like do 'file' except that it will only parse any particular file one time and will raise an exception if something goes wrong. (e.g. the file can't be found, it contains a syntax error, etc.) The automatic error checking makes it a good replacement for do 'file' but it's still only suited for the same simple uses.
The do 'file' and require 'file' forms are carryovers from days long past when the *.pl file extension meant "Perl Library." The modern way of reusing code in Perl is to organize it into modules. Calling something a "module" instead of a "library" is just semantics, but the words mean distinctly different things in Perl culture. A library is just a collection of subroutines; a module provides a namespace, making it far more suitable for reuse.
use Module is the normal way of using code from a module. Note that Module is the package name as a bareword and not a quoted string containing a file name. Perl handles the translation from a package name to a file name for you. use statements happen at compile time and throw an exception if they fail. This means that if a module your code depends on isn't available or fails to load the error will be apparent immediately. Additionally, use automatically calls the import() method of the module if it has one which can save you a little typing.
require Module is like use Module except that it happens at runtime and does not automatically call the module's import() method. Normally you want to use use to fail early and predictably, but sometimes require is better. For example, require can be used to delay the loading of large modules which are only occasionally required or to make a module optional. (i.e. use the module if it's available but fall back on something else or reduce functionality if it isn't.)
Strictly speaking, the only difference between require Module and require 'file' is that the first form triggers the automatic translation from a package name like Foo::Bar to a file name like Foo/Bar.pm while the latter form expects a filename to start with. By convention, though, the first form is used for loading modules while the second form is used for loading libraries.
There is a major preference for using use, because it happens at an earlier state BEGIN {} during compilation, and the errors tend to be propagated to the user at a more appropriate time. It also calls the sub import {} function which gives the caller control over the import process. This is something that is heavily used. You can get the same effect, by calling the specific namespace's import, but that requires you to know the name of the namespace, and the file, and to code the call to the subroutine... which is a lot more work. Conversely, use, just requires you to know the namespace, and then it requires the file with the matching namespace -- thus making the link between namespaces and files less of an conscious to the user.
Read perldoc -f use, and perldoc -f require, for more information. Per perldoc -f use:
use is the same as BEGIN { require Module; Module->import( LIST ); } Which is just a lot more ugly.
The main difference is around import/export. use is preferred when you're making use of a module because it allows you to specify which routines you wish to import into your namespace :
use MyModule qw(foo bar baz); # allows foo(), bar() and baz() to be used
use MyModule qw(); # Requires explicit naming (e.g. MyModule::foo).
use also gives runs the module's import() procedure which is often used to set the module up.
See the perldoc for use for more detail.
Using use to module will include module during compile time which increases speed but uses more memory, whereas using require module will include during run time. Requiring a module without using import when needed uses less memory but reduces speed.