What exactly is a source filter? - perl

Whenever I see the term source filter I am left wondering as to what it refers to.
Aside from a formal definition, I think an example would also be helpful to drive the message home.

A source filter is a module that modifies some other code before it is evaluated. Therefore the code that is executed is not what the programmer sees when it is written. You can read more about source filters (in the Perl context) at perldoc perlfilter. Some examples are Smart::Comments which allows the programmer to leave debugging commands in comments in the code and employ them only if desired, another is PDL::NiceSlice which is sugar for slicing PDL objects.
Edit:
For more information on usage (should you wish to brave the beast), read the documentation for Filter::Simple which is a typical way to create filters.
Alternatively there is a new and different way to muck about with the source: Devel::Declare lets you interact with Perl's own parser, letting you do many of the same type of thing as a source filter, but without the source filter. This can be "safer" in some respect, yet it has a more limited scope.

A source filter is a form of module which affects the way in which a file use-ing it will be parsed. They are commonly used to simulate syntactical features which Perl does not have natively -- for instance, the Switch source filter was often used to simulate switch statements before Perl's given { } construction was available.
Source filters work by taking the text of the module as input, performing some processing on it, and outputting the filtered source code. For a simple example of how a source filter is implemented, as well as more details, see the perldoc page for perlfilter.

They are pre-processors. They change the source code before it reaches the Perl compiler. You can do scary things with them, in effect implementing your own language, with all the effects this has on readability (for others), robustness (writing parsers is hard) and maintainability (debugging gets tricky when your idea of what the source code is differs from what compiler and runtime think it is).

Related

invoke coq typechecker from external programs

What would be the best way to interact with Coq from an external program? For example, let's say I want to programmatically generate programs / proofs in some language other than Coq and I just want to call Coq to typecheck them. Is there a standard way to do something like that?
You have a couple of options.
Construct .v files, invoke coqc, check the return code and parse the output of coqc.
This is, in some sense, the most stable way to interact with Coq. It has the most inter-version stability. It's also the most inflexible; you create a .v file, and check it all in one go.
For an example of this method, see my Coq bug minimizer (specificially get_coq_output in diagnose_error.py), which repeatedly makes small alterations to a .v file and checks to see that the alterations don't change the error message given by coqc.
Use the XML protocol to communicate with coqtop
This is the method used by CoqIDE and by upcoming versions of ProofGeneral. Logitext invokes from Haskell a custom patched version of coqtop with the pgip protocol, which was an earlier attempt at a more standardized way of communication with the prover (see this issue for more details).
This is becoming more stable, and gives more fine-grained control over what you want checked. For example, it allows you to check multiple proofs within a single session, which is important if you depend on a large library that takes time to load, and need to check many small proofs.
Write a custom OCaml toplevel wrapper for the interface to Coq that you want
The main example of this that I'm aware of is PIDEtop, which is used in the Coqoon Eclipse plugin. I suspect that some of the other entries in the GUI section of Related Tools use this method.
Note that coqtop is itself a toplevel wrapper in this style; the files in the toplevel/ folder of the Coq project are likely to be informative.
This gives you the most flexibility and reusability, at the cost of having to design your own protocol, or implement an existing protocol.
Write your external program in OCaml and link with Coq
Much like (3), this method gives you as much flexibility as you want. In fact, the only difference between this and (3) is that in (3), you separate out the communication with Coq into its own binary, whereas here, you fuse communication with Coq with the other functionality of your program. I'm not aware of programs in this style, though I believe coqchk may qualify, as I think it shares a couple of files with the Coq kernel (see the checker/ folder in the Coq codebase).
Regardless of which way you choose, I think that modelling off of existing projects will be more fruitful than making use of (as-yet incomplete) documentation on the various APIs and protocols. The API has been undergoing a lot of revision recently, in an attempt to get it into a reasonable and stable state, and the XML protocol has also been subject to recent improvements; #ejgallego has been the driving force behind much of these improvements.

How is Perl useful as a metadata tool?

In The Pragmatic Programmer:
Normally, you can simply hide a third-party product behind a
well-defined, abstract interface. In fact , we've always been able to
do so on any project we've worked on. But suppose you couldn't isolate
it that cleanly. What if you had to sprinkle certain statements
liberally throughout the code? Put that requirement in metadata, and
use some automatic mechanism, such as Aspects (see page 39 ) or Perl,
to insert the necessary statements into the code itself.
Here the author is referring to Aspect Oriented Programming and Perl as tools that support "automatic mechanisms" for inserting metadata.
In my mind I envision some type of run-time injection of code. How does Perl allow for "automatic mechanisms" for inserting metadata?
Skip ahead to the section on Code Generators. The author provides a number of examples of processing input files to generate code, including this one:
Another example of melding environments using code generators happens when different programming languages are used in the same application. In order to communicate, each code base will need some information in commondata structures, message formats, and field names, for example. Rather than duplicate this information, use a code generator. Sometimes you can parse the information out of the source files of one language and use it to generate code in a second language. Often, though, it is simpler to express it in a simpler, language-neutral representation and generate the code for both languages, as shown in Figure 3.4 on the following page. Also see the answer to Exercise 13 on page 286 for an example of how to separate the parsing of the flat file representation from code generation.
The answer to Exercise 13 is a set of Perl programs used to generate C and Pascal data structures from a common input file.

How can I tell if a Perl module is actually used in my program?

I have been on a "cleaning spree" lately at work, doing a lot of touch-up stuff that should have been done awhile ago. One thing I have been doing is deleted modules that were imported into files and never used, or they were used at one point but not anymore. To do this I have just been deleting an import and running the program's test file. Which gets really, really tedious.
Is there any programmatic way of doing this? Short of me writing a program myself to do it.
Short answer, you can't.
Longer possibly more useful answer, you won't find a general purpose tool that will tell you with 100% certainty whether the module you're purging will actually be used. But you may be able to build a special purpose tool to help you with the manual search that you're currently doing on your codebase. Maybe try a wrapper around your test suite that removes the use statements for you and ignores any error messages except messages that say Undefined subroutine &__PACKAGE__::foo and other messages that occur when accessing missing features of any module. The wrapper could then automatically perform a dumb source scan on the codebase of the module being purged to see if the missing subroutine foo (or other feature) might be defined in the unwanted module.
You can supplement this with Devel::Cover to determine which parts of your code don't have tests so you can manually inspect those areas and maybe get insight into whether they are using code from the module you're trying to purge.
Due to the halting problem you can't statically determine whether any program, of sufficient complexity, will exit or not. This applies to your problem because the "last" instruction of your program might be the one that uses the module you're purging. And since it is impossible to determine what the last instruction is, or if it will ever be executed, it is impossible to statically determine if that module will be used. Further, in a dynamic language, which can extend the program during it's run, analysis of the source or even the post-compile symbol tables would only tell you what was calling the unwanted module just before run-time (whatever that means).
Because of this you won't find a general purpose tool that works for all programs. However, if you are positive that your code doesn't use certain run-time features of Perl you might be able to write a tool suited to your program that can determine if code from the module you're purging will actually be executed.
You might create alternative versions of the modules in question, which have only an AUTOLOAD method (and import, see comment) in it. Make this AUTOLOAD method croak on use. Put this module first into the include path.
You might refine this method by making AUTOLOAD only log the usage and then load the real module and forward the original function call. You could also have a subroutine first in #INC which creates the fake module on the fly if necessary.
Of course you need a good test coverage to detect even rare uses.
This concept is definitely not perfect, but it might work with lots of modules and simplify the testing.

Using Inline::CPP vs SWIG - when?

In this question i saw two different answers how to directly call functions written in C++
Inline::CPP (and here are more, like Inline::C, Inline::Lua, etc..)
SWIG
Handmade (as daxim told - majority of modules are handwritten)
I just browsed nearly all questions in SO tagged [perl][swig] for finding answer for the next questions:
What are the main differences using (choosing between) SWIG and Inline::CPP or Handwritten?
When is the "good practice" - recommented to use Inline::CPP (or Inline:C) and when is recommented to use SWIG or Handwritten?
As I thinking about it, using SWIG is more universal for other uses, like asked in this question and Inline::CPP is perl-specific. But, from the perl's point of view, is here some (any) significant difference?
I haven't used SWIG, so I cannot speak directly to it. But I'm pretty familiar with Inline::CPP.
If you would like to compose C++ code that gets compiled and becomes callable from within Perl, Inline::CPP facilitates this. So long as the C++ code doesn't change, it should only compile once. If you base a module on Inline::CPP, the code will be compiled at module install time, so another user never really sees the first time compilation lag; it happens at install time, just before the testing phase.
Inline::CPP is not 100% free of portability isues. The target user must have a C++ compiler that is of similar flavor to the C compiler used to build Perl, and the C++ standard libraries should be of versions that produce binary-compatible code with Perl. Inline::CPP has about a 94% success rate with the CPAN testers. And those last 6% almost always boil down to issues of the installation process not correctly deciphering what C++ compiler and libraries to use. ...and of those, it usually comes down to the libraries.
Let's assume you as a module author find yourself in that 95% who have no problem getting Inline::CPP installed. If you know that your target audience will fall into that same category, then producing a module based on Inline::CPP is simple. You basically have to add a couple of directives (VERSION and NAME), and swap out your Makefile.PL's ExtUtils::MakeMaker call to Inline::MakeMaker (it will invoke ExtUtils::MakeMaker). You might also want a CONFIGURE_REQUIRES directive to specify a current version of ExtUtils::MakeMaker when you create your distribution; this insures that your users have a cleaner install experience.
Now if you're creating the module for general consumption and have no idea whether your target user will fit that 94% majority who can use Inline::CPP, you might be better off removing the Inline::CPP dependency. You might want to do this just to minimize the dependency chain anyway; it's nicer for your users. In that case, compose your code to work with Inline::CPP, and then use InlineX::CPP2XS to convert it to a plain old XS module. Your user will now be able to install without the process pulling Inline::CPP in first.
C++ is a large language, and Inline::CPP handles a large subset of it. Pay attention to the typemap file to determine what sorts of parameters can be passed (and converted) automatically, and what sorts are better dealt with using "guts and API" calls. One feature I wouldn't recommend using is automatic string conversion, as it would produce Unicode-unfriendly conversions. Better to handle strings explicitly through API calls.
The portion of C++ that isn't handled gracefully by Inline::CPP is template metaprogramming. You're free to use templates in your code, and free to use the STL. However, you cannot simply pass STL type parameters and hope that Inline::CPP will know how to convert them. It deals with POD (basic data types), not STL stuff. Furthermore, if you compose a template-based function or object method, the C++ compiler won't know what context Perl plans to call the function in, so it won't know what type to apply to the template at compiletime. Consequently, the functions and object methods exposed directly to Inline::CPP need to be plain functions or methods; not template functions or classes.
These limitations in practice aren't hard to deal with as long as you know what to expect. If you want to expose a template class directly to Inline::CPP, just write a wrapper class that either inherits or composes itself of the template class, but gives it a concrete type for Inline::CPP to work with.
Inline::CPP is also useful in automatically generating function wrappers for existing C++ libraries. The documentation explains how to do that.
One of the advantages to Inline::CPP over Swig is that if you already have some experience with perlguts, perlapi, and perlcall, you will feel right at home already. With Swig, you'll have to learn the Swig way of doing things first, and then figure out how to apply that to Perl, and possibly, how to do it in a way that is CPAN-distributable.
Another advantage of using Inline::CPP is that it is a somewhat familiar tool in the Perl community. You are going to find a lot more people who understand Perl XS, Inline::C, and to some extent Inline::CPP than you will find people who have used Swig with Perl. Although XS can be messy, it's a road more heavily travelled than using Perl with Swig.
Inline::CPP is also a common topic on the inline#perl.org mailing list. In addition to myself, the maintainer of Inline::C and several other Inline-family maintainers frequent the list, and do our best to assist people who need a hand getting going with the Inline family of modules.
You might also find my Perl Mongers talk on Inline::CPP useful in exploring how it might work for you. Additionally, Math::Prime::FastSieve stands as a proof-of-concept for basing a module on Inline::CPP (with an Inline::CPP dependency). Furthermore, Rob (sisyphus), the current Inline maintainer, and author of InlineX::CPP2XS has actually included an example in the InlineX::CPP2XS distribution that takes my Math::Prime::FastSieve and converts it to plain XS code using his InlineX::CPP2XS.
You should probably also give ExtUtils::XSpp a look. I think it requires you to declare a bit more stuff than Inline::CPP or SWIG, but it's rather powerful.

perl IO eventhandler for untainting strings

How do I create an event handler in my Perl code to intercept all File/Directory/system-based calls, so that I can untaint input in a just-in-time fashion.
I have lots of IO access in my script, and I find adding manual code for untainting cumbersome.
Can this be done without need to install a third-party CPAN module?
You could try taking an aspect-oriented approach but it does require installing a CPAN module, Aspect.
To capture calls to a particular method / function, you define a pointcut (taken from the Aspect POD):
$pointcut = call qr/^Person::[gs]et_/; # defines a collection of events
Then define the code to take before the call:
$before = before {
print "g/set will soon be called";
} $pointcut;
Although I'm not sure if the Aspect module allows you to trap calls to the CORE::* namespace.
How do expect to untaint general data? If you're just going to blindly accept everything despite its source, there's no point in using taint-checking.
You might want to read the "Secure Programming Techniques" chapter in Mastering Perl. I give quite a bit of advice for dealing with this sort of stuff. However, any good advice is going to be targeted at specific situations, not generalizations.