How to deal with Perl (vague) errors - perl

I have been encountering errors in Perl code that are reported at places which have nothing to do with the error. I fixed one such error after hours of removing and re adding code line by line and then doing some trial and error. Two such errors are detailed below. My question is : If these issues happen in future, is there a way to ensure that perl compiler helps me fix this OR do I have to rewrite the code in some other language. (am considering Java).
My program looks like this:
use switch;
use strict;
use warnings;
...other modules;
sub log{
}
..various sub routines
switch {$val1)
{
log(..) #first invocation of log
case ($val2)
...
}
Now within sub log if I do this
{
$val3 = POSIX::floor($val2/$val4)*$val4;
$val5="/x/y/$logfilename";
}
I get an error saying that there is an error at the case statement.
If I move the line $val5="/x/y/$logfilename"; BEFORE $val3, there is no error.
OR if I remove the '/' in $val5 i.e. $val5="x", there is no error
OR if I say $val5=qq(/x/y/$logfilename); there is no error.
This time I consider myself lucky that I found a workaround but it was only after 3 hours of fighting this. Is there a way to make perl compiler report the errors accurately?
I have one more similar case to report and can add if necessary.
Inputs solicited

Ignoring the problems with your post, the issue you are probably running into is with the fact that the switch functionality in perl is implemented with a source filter. This basically means that the source code is preprocessed to convert what looks like a switch case structure into valid perl code before compilation. Perl's parser usually gets line numbers a little wrong on complex structures, but when source filters are involved, all bets are off. This is one of the reasons that Switch has been deprecated in modern versions of Perl.
The solution is to stop using switch, and either use Perl's other control structures, or use a version of perl new enough to support the given/when construct (5.10+), which is modern Perl's version of switch/case.

LIMITATIONS in Switch sounds like it's relevant:
Due to the heuristic nature of Switch.pm's source parsing, the presence of regexes with embedded newlines that are specified with raw /.../ delimiters and don't have a modifier //x are indistinguishable from code chunks beginning with the division operator /. As a workaround you must use m/.../ or m?...? for such patterns. Also, the presence of regexes specified with raw ?...? delimiters may cause mysterious errors. The workaround is to use m?...? instead.
Switch module is deep black magic. Please don't use it unless you're prepared to debug its problems.

Related

Perl: Is it possible to dynamically fix compile time error?

If I have, for example, next perl script:
use strict;
use warnings;
print $x;
When I run this script, compilation will fail with error:
Global symbol "$x" requires explicit package name (did you forget to declare "my $x"?) at ...
Is it possible to write some perl module which will be called when this error occur and automatically fix this error and continue compilation? (Even links to any info is OK)
# This code is incorrect.
# Here I just ask about such ability
# This code is very weak approximation how it might look
package AutoFix;
sub fix {
$main::x = 'You are defined now';
}
1;
So next code will not fail and print You are defined now:
use strict;
use warnings;
use AutoFix;
print $x;
How much work would you like to do to create the code that could figure out what the fix should be? And, will that amount of work be comparable or less to the work required to examine code by hand?
Now, I'm writing all of this having spent quite a bit of time trying to come up with a system to analyze CPAN installer output to figure out what went wrong (a major impetus for CPANPLUS, now relegated to history). It's easy to tell that something is not right, but beyond that is a lot of suffering.
In your example, you have an error about an undeclared variable. How does AutoFix know if that should be a package or a lexical variable? You can guess one or the other, but you actually have two big problems:
What is the intent of the code?
Does the code reflect the actual intent?
Determining the intent of the code is often very difficult for even an experienced human programmer to figure out (just read StackOverflow question comments). Compiling code is often not correct code, in the sense that it doesn't achieve the desired outcome. Furthermore, does the programmer even understand the problem? Does the code the programmer wrote (incorrectly here) reflect the actual work the code should do? It's difficult for humans in code review to figure this out. Tools like Coverity can guess at problems it knows about, but they aren't going to be able to correct the code.
But let's say that the programmer understands the problem. Have they correctly expressed that? The longer you've been programming, the more you lean toward "no", in general, in my experience.
This is completely different than the database constraint you mentioned. That's a narrowly targeted fix for an expected and allowed situation. Consider a different parallel: if the record has a New York area code but a Chicago address, should I fix the city? When I was a younger dumbass, I did a similar thing to a database. It was stupid because I thought I knew something I didn't, and everyone who understood the situation recognized it immediately. Even then, those sorts of constraints are how we model what we know about the world, not what the world actually is.
Now, to make AutoFix, you need to make something that can look at code, understand it, and figure out what it should do. You can make guesses, but you have no basis for playing the probabilities there.
Technical matters can't solve this. AutoFix can undo the work of pragmas such that some classes of errors don't show up, but so what? The program with an error just continues? How does that help anyone?
Not only that, compilers tend to complain when they realize they can't parse something. What they complain about is often not the problem. The first thing I teach people while debugging is that they need to look at the statement immediately proceeding the line line number in the error message. Any error message you catch can have a virtually infinite number of causes.
Consider this code, which fails in the same way as your example (same error message) but for a completely different but common reason:
use strict;
use warnings;
my $x = 5,
print $x++;
How do you figure out what the fix should be? It's not about declaring $x.
So, you now have two cases, and you build that your fixer. Then you encounter another case, so you build that in. And you keep doing this until eventually you have a large dictionary of fixes. Maybe you get a bit crazy and do some machine learning (and wouldn't a corpus of bad code and resolutions be cool).
But, the program still can't continue. It has to start over because it has to at least back up to where it should have done something but didn't. You can't merely restart the program because you don't know if its idempotent. Re-runing the program might redo work it shouldn't, such as inserting duplicate into databases.
Having said all that, this sort of thing is related to static analysis and the refactoring browser. Adam Kennedy's Parse Perl Isolated (PPI) project was a first step into understanding Perl code without compiling it, then move toward the Smalltalk ideal of understanding which parts of code represented the same thing. If you knew that two things named foo were the same thing, you could rearrange code dealing with foo. For example, if you renamed a method from bar to set_bar, you could immediately know which bars you should rename and which belonged to some other class.
Adam wrote Acme::BadExample and challenged anyone to get it to run. He wrote "any given piece of Perl source exists in bizarre pseudo-quantum-like state, in that it demonstrates both duality and indeterminism."
Jos Boumans stepped up and used some mind-bending Perl, which he then showed in Barely Legal XXX Perl, which I think he first presented in 2006. He was amazingly creative in his solutions, and in a way that I wouldn't want in production code.
Perl doesn't even know, by design, what type of thing will be in a variable or even that the method you might call on it will exist. In fact, it defers so much to the runtime, trusting that things will be in place by the time you need them, that we often say "only Perl can parse perl". You literally need to be able to run Perl code to properly compile it since BEGIN blocks can affect the parse. For example, a BEGIN can define a subroutine with a certain arity. How do you parse foo 5, 6? You have to know what has already been defined.
Perl has other "action at a distance" features that make this even tougher. autodie redefines CORE features to add extra behavior, but you might not be able to see that in the code. You can set default regex flags (and I've seen plenty of big screw ups by people applying /isxm to entire files without checking).
As noted above, autofixing compile time error is not possible (or probably hard to fix)
Instead of fixing compile time error try to resolve your problem in different way.
For example. In your script you use $x variable. Probably you know that you will use it and you want to get instance of some value, e.g. You are defined now then you could use Exporter:
use strict;
use warnings;
use AutoFix qw/ $x /;
print $x;
And AutoFix module will look like:
package AutoFix;
require Exporter;
our #ISA = qw(Exporter);
our #EXPORT_OK = qw( $x $y $z ); # symbols to export on request
... # code which will create instance of $x $y $z on request
1;
Gool luck ;-)

switch vs. given vs. for-when vs. if-elsif-else in Perl

In one of the last scripts I wrote, I needed a behavior similar to a switch statement behavior. A simple search of an equivalent in Perl led me to use Switch. At the beginning, all was fine and working, until everything just crashed with errors that are not very descriptive (it happened on a switch statement that had cases with regex, but strangely it didn't happen on other switch statements that are alike).
EDIT: the code that crashed was looking like this one:
switch ($var) {
case /pattern1/ {...}
case /pattern2/ {...}
...
else {...}
}
That led me to abandon the use of Switch.pm and search for an alternative.
I found given and for-when and of course there's always the straightforward and somewhat naive if-elsif-else.
Why is Switch.pm so unstable?
It seems given and for-when have a similar structure, but I guess there's a difference (because both exist). What is it?
Is if-elsif-else significantly slower than the other options?
Perl's when and smart-matching are experimental, and they won't become features without backward-incompatible changes. You should not use these.
Switch.pm is a source filter, so it can produce incorrect error message when something's wrong. It also suffers from the same problems as smart-matching. You should not use this.
So, of the options you listed, only one is viable, and it's not any slower at all!

General check of missing semicolon

As a Perl beginner I am sometimes getting compilation errors and have to search a lot to find it. In the end it is just a missing semicolon at the end of a line. Some syntax errors with missing semicolon are checked by Perl but not in general. Is there a way to get this check?
edit:
I know about Perl::Critic but can't use it atm. And I don't know if it checks for missing semicolon in general.
Because semicolons actually mean something in Perl and aren't just there for decoration, it's not possible for any tool (even the Perl interpreter itself) to know in every case whether you actually meant to leave off the semi-colon or not. Thus, there's no general-case answer to your question; you'll just need to go through your code and make sure it's correct.
As mentioned in my comments, there are various tricks you can try with your editor to expedite the process of finding potentially-incorrect lines; you must, however, either examine and fix these by hand or risk introducing new problems.
The syntax check is perl -c, but that's no different than attempting to run the program outright. Due to its flexible/undecidable syntax, one cannot generally do what you want. That's the downside of comfort and expressiveness.
Upgrade to the latest stable Perl, the parser's error messages got better/more exact over the last years and will correctly recognise many circumstances of a missing semicolon.
Rule of thumb that works for many parsers/other languages: if the error makes no sense, look a couple of lines before.
use diagnostics; usually gives you a nice hint, same as use warnings;. Try to keep a consistent coding style, check perlstyle.
Also you can use Perl::Critic online.
Also as general advice learn how to use packages and modules, try to group code into subs and study the syntax of arrays, lists and hashes. A common mistake is forgetting the ; after an anonymous hashref assignment:
my $hashref = { a => 5, b => 10};

Filehandles and XML::Simple -> Memory corruption. Can't isolate problem

In a small test file, I can run
#!/usr/bin/perl
use warnings;
use strict;
use open qw{:utf8 :std};
use XML::Simple;
my #cmdline = ("hg", "log", "-v", "--style", "xml");
open my $xml, "#cmdline |";
my $xmllog = XMLin($xml, ForceArray => ['logentry', 'parent', 'copy', 'path']);
foreach my $rev (#{$xmllog->{logentry}}) {
#do stuff
}
and it works fine. When I run the same code in a larger program (with the same XML input), it terminates with
*** glibc detected *** /usr/bin/perl: malloc(): memory corruption: 0x0a40e308 ***
(full crash log # pastebin.com)
However, if I do the exchange
#open my $xml, "#cmdline |";
my $xml = `#cmdline`;
then it works (in both files), so this is more a question of curiosity than a real problem for me.
Does anyone have any pointers on what the difference between my test case and the larger code base might be?
Is there a speed/memory/? difference in the different command calls? Best practices?
Debian Sid: Perl 5.12.4-1.
(This is my first Perl encounter, so don't assume too much about what I "should" know about the language. I just dove into existing code.)
(The larger program is ikiwiki, so the code is not a secret, but I don't know where to look for trouble, and I can't include all the code in this post for practical reasons. This concerns the Mercurial backend.)
As per suggestion from cjm, I added print "$_\n" for sort grep /XML/, keys %INC; which gave output
RPC/XML.pm
RPC/XML/Client.pm
RPC/XML/ParserFactory.pm
XML/NamespaceSupport.pm
XML/Parser.pm
XML/Parser/Expat.pm
XML/SAX.pm
XML/SAX/Base.pm
XML/SAX/Exception.pm
XML/SAX/Expat.pm
XML/SAX/ParserFactory.pm
XML/Simple.pm
in the large project, and
XML/NamespaceSupport.pm
XML/Parser.pm
XML/Parser/Expat.pm
XML/SAX.pm
XML/SAX/Base.pm
XML/SAX/Exception.pm
XML/SAX/Expat.pm
XML/SAX/ParserFactory.pm
XML/Simple.pm
in the test file.
Update: I installed the Debian package libxml-libxml-perl and added $XML::SAX::ParserPackage = "XML::LibXML::SAX"; as suggested. This also crashed, with a different message this time:
*** stack smashing detected ***: /usr/bin/perl terminated
full backtrace # pastebin.com
This time it happened consistently in both the large and the small file, though. Also, only when using open, not when using backticks.
I also installed libxml-libxml-simple-perl, but that was not supposed to be more than in practice a wrapper to always use XML::LibXML as parser. It also behaved differently and complained about the options to XMLin() that was set, so I discarded it.
Trying to explicitly (and blindly) make the program use each of the alternatives given by print "$_\n" for sort grep /XML/, keys %INC; seems to point towards that XML::SAX::Expat is used by default as cjm said (since all other alternatives exit with errors, and XML::SAX:Expat behaves exactly like the original problem in both files. Explicitly demanding XML::Simple goes into a loop that allocates all my memory).
I'm thankful for learning about different XML parsers and that XML::Simple automatically chooses different ones. Both parts of my original question somewhat remain though:
Why do the programs behave differently? Even if I explicitly set $XML::SAX::ParserPackage = "XML::SAX::Expat" in both programs, one crashes (using open) and the other works.
Should I use another method to receive output from the external command? Is it even wrong to expect XMLin() ta work with open (but why does it work in one case, then?)?
Or are they simple the "wrong" questions to ask (i.e. irrelevant)?
UPDATE: More than a week has passed, not a flurry of activity here, and I solve it a bit differently now, without problems. I mark cjm's answer as correct, since it got me further in the error analysis. Thanks!
XML::Simple is pure-Perl, so it's unlikely to cause the memory corruption you report. It depends on a lower-level XML parser, and it's likely the bug you've encountered is in there. But there are multiple parsers it could be using, and we'd need to know which one.
Try adding this line right after the XMLin line in your sample program, and update your question with the results:
print "$_\n" for sort grep /XML/, keys %INC;
This will tell us which XML parser you're actually using on your system.
Update: Since it looks like you're using XML::Parser (through its SAX interface XML::SAX::Expat, I'd suggest trying XML::LibXML::SAX instead. Libxml2 is considered one of the better XML parsers.
If you don't already have XML::LibXML::SAX installed, just installing it should switch your default SAX parser to it. If it is installed, try putting
$XML::SAX::ParserPackage = "XML::LibXML::SAX";
at the beginning of your program. (See XML::SAX::ParserFactory for how the SAX parser is selected.)

Why are Perl source filters bad and when is it OK to use them?

It is "common knowledge" that source filters are bad and should not be used in production code.
When answering a a similar, but more specific question I couldn't find any good references that explain clearly why filters are bad and when they can be safely used. I think now is time to create one.
Why are source filters bad?
When is it OK to use a source filter?
Why source filters are bad:
Nothing but perl can parse Perl. (Source filters are fragile.)
When a source filter breaks pretty much anything can happen. (They can introduce subtle and very hard to find bugs.)
Source filters can break tools that work with source code. (PPI, refactoring, static analysis, etc.)
Source filters are mutually exclusive. (You can't use more than one at a time -- unless you're psychotic).
When they're okay:
You're experimenting.
You're writing throw-away code.
Your name is Damian and you must be allowed to program in latin.
You're programming in Perl 6.
Only perl can parse Perl (see this example):
#result = (dothis $foo, $bar);
# Which of the following is it equivalent to?
#result = (dothis($foo), $bar);
#result = dothis($foo, $bar);
This kind of ambiguity makes it very hard to write source filters that always succeed and do the right thing. When things go wrong, debugging is awkward.
After crashing and burning a few times, I have developed the superstitious approach of never trying to write another source filter.
I do occasionally use Smart::Comments for debugging, though. When I do, I load the module on the command line:
$ perl -MSmart::Comments test.pl
so as to avoid any chance that it might remain enabled in production code.
See also: Perl Cannot Be Parsed: A Formal Proof
I don't like source filters because you can't tell what code is going to do just by reading it. Additionally, things that look like they aren't executable, such as comments, might magically be executable with the filter. You (or more likely your coworkers) could delete what you think isn't important and break things.
Having said that, if you are implementing your own little language that you want to turn into Perl, source filters might be the right tool. However, just don't call it Perl. :)
It's worth mentioning that Devel::Declare keywords (and starting with Perl 5.11.2, pluggable keywords) aren't source filters, and don't run afoul of the "only perl can parse Perl" problem. This is because they're run by the perl parser itself, they take what they need from the input, and then they return control to the very same parser.
For example, when you declare a method in MooseX::Declare like this:
method frob ($bubble, $bobble does coerce) {
... # complicated code
}
The word "method" invokes the method keyword parser, which uses its own grammar to get the method name and parse the method signature (which isn't Perl, but it doesn't need to be -- it just needs to be well-defined). Then it leaves perl to parse the method body as the body of a sub. Anything anywhere in your code that isn't between the word "method" and the end of a method signature doesn't get seen by the method parser at all, so it can't break your code, no matter how tricky you get.
The problem I see is the same problem you encounter with any C/C++ macro more complex than defining a constant: It degrades your ability to understand what the code is doing by looking at it, because you're not looking at the code that actually executes.
In theory, a source filter is no more dangerous than any other module, since you could easily write a module that redefines builtins or other constructs in "unexpected" ways. In practice however, it is quite hard to write a source filter in a way where you can prove that its not going to make a mistake. I tried my hand at writing a source filter that implements the perl6 feed operators in perl5 (Perl6::Feeds on cpan). You can take a look at the regular expressions to see the acrobatics required to simply figure out the boundaries of expression scope. While the filter works, and provides a test bed to experiment with feeds, I wouldn't consider using it in a production environment without many many more hours of testing.
Filter::Simple certainly comes in handy by dealing with 'the gory details of parsing quoted constructs', so I would be wary of any source filter that doesn't start there.
In all, it really depends on the filter you are using, and how broad a scope it tries to match against. If it is something simple like a c macro, then its "probably" ok, but if its something complicated then its a judgement call. I personally can't wait to play around with perl6's macro system. Finally lisp wont have anything on perl :-)
There is a nice example here that shows in what trouble you can get with source filters.
http://shadow.cat/blog/matt-s-trout/show-us-the-whole-code/
They used a module called Switch, which is based on source filters. And because of that, they were unable to find the source of an error message for days.