What text editor does most accurate job of syntax highlighting Perl - perl

I know I risk asking a speculative question, however, inspired by this recent question I wonder which editor does the best job of syntax highlighting Perl. Being well aware of the difficulties (impossibilities) of parsing Perl I know there will not be a perfect case. Still I wonder if there is a clear leader in faithful representation.
N.B. I use gedit and it works fine, but with known issues.

Komodo Edit does a good job and also scans your modules (including those installed via CPAN) and does well at generating autocomplete data for them.

I'm a loyal vim user and rarely encounter anything odd with the native syntax.vim, except for these cases (I'll edit in more if/when I find them; others please feel free also):
!!expression is better written !!!!expression (everything after two ! is rendered as a comment quoted string; four ! brings everything back to normal)
m## or s### renders everything after the # as a comment; I usually use {} as a delimiter when avoiding / for leaning toothpick syndrome
some edge cases for $hash{key} where key is not a simple alphanumeric string - although it's safer to enclose such key names in '' anyway so as to not have to look up the exact cases for when a bareword is treated as a key name

I haven't used it, but Padre should be good since it's written in Perl. IIRC It uses PPI for parsing

jEdit...with the tweaks that I have amassed over the years. It's got the most customizable syntax highlighting I've ever seen.

I use Emacs in CPerl mode. I think it does a terrific job, although similar to Ether's answer, it's not perfect. What's more, I usually use Htmlize to publish highlighted code to the web. It's kind of annoying to use fancier forums like this one that do their own syntax highlighting, since it's not really any easier and the results aren't as good.

Related

How to find literals in source code of Smartforms and in SAPScripts (or reports, if the others can't be done)

I'd like to check hardcoded values in (a lot of) Smartforms and SAPScript forms.
I have found a way to read the source code of both of these, but it seems that i will have to go through a lot of parsing before I get anything reliable.
I've come across function module GET_LITERAL but that doesn't seem to help me much since i have to specify the offset of the value, if i got right what the function is doing in the first place.
I also found RS_LITERAL_LIST but that also doesn't do what i expect.
I also tried searching for reports and methods, but haven't found anything that seemed to help.
A backup plan would be to get some good parsing tool, so do you know of anything like that.
Anyway, any hints would be helpful and appreciated.
[EDIT]
Forgot to mention, the version of my system is 4.6C
If you have a fairly recent version of ABAP, you can use a regex.
Follow the pattern of this example, but use your source as the text and create your own regex. Have it look for any single quotes on the end of a word separated by spaces or any integers with spaces on either side. That's just a start, you might need to work on a better pattern.
String functions count, find, and match

General check of missing semicolon

As a Perl beginner I am sometimes getting compilation errors and have to search a lot to find it. In the end it is just a missing semicolon at the end of a line. Some syntax errors with missing semicolon are checked by Perl but not in general. Is there a way to get this check?
edit:
I know about Perl::Critic but can't use it atm. And I don't know if it checks for missing semicolon in general.
Because semicolons actually mean something in Perl and aren't just there for decoration, it's not possible for any tool (even the Perl interpreter itself) to know in every case whether you actually meant to leave off the semi-colon or not. Thus, there's no general-case answer to your question; you'll just need to go through your code and make sure it's correct.
As mentioned in my comments, there are various tricks you can try with your editor to expedite the process of finding potentially-incorrect lines; you must, however, either examine and fix these by hand or risk introducing new problems.
The syntax check is perl -c, but that's no different than attempting to run the program outright. Due to its flexible/undecidable syntax, one cannot generally do what you want. That's the downside of comfort and expressiveness.
Upgrade to the latest stable Perl, the parser's error messages got better/more exact over the last years and will correctly recognise many circumstances of a missing semicolon.
Rule of thumb that works for many parsers/other languages: if the error makes no sense, look a couple of lines before.
use diagnostics; usually gives you a nice hint, same as use warnings;. Try to keep a consistent coding style, check perlstyle.
Also you can use Perl::Critic online.
Also as general advice learn how to use packages and modules, try to group code into subs and study the syntax of arrays, lists and hashes. A common mistake is forgetting the ; after an anonymous hashref assignment:
my $hashref = { a => 5, b => 10};

How to test/classify CPAN modules for utf8 correctness

Here is an excellent question and the wonderful tchrist's answer with 7+24+52 advices&comments how to make an perl program utf8 safe.
But here is 19k CPAN modules. What is possible to do for differentiating "good" and "bad" ones? (from the utf8's point of view)
For example: File::Slurp if you will read the file with
#use strict encoding warnings utf8 autodie... etc....
my $str = read_file($file, binmode => ':utf8');
you will get different results based on command line switches, and perl -CSDA will not work. Sad. (Yes, i know than Encode::decode("utf8", read_file($file, binmode => ':raw')); will help, but SAD anyway.
My questions:
is here any preferred way, how to test/classify what CPAN modules are utf8 safe/ready/correct?
is here some Test::something already done for utf8 testing?
is here something like Perl::Critic for utf8 - what will check the module source for possible utf8 incorrectness? (because manually checking sources for 7+24+52 things i cannot classify as the "easy way to programming")
or any other way? :)
I understand, than much of CPAN modules simply does not need to know about utf8. But here are zilion others what should.
Please, don't misunderstand me. I love Perl language. I know than perl has extremely powerful utf8 capability. (especially 5.14). The above was not mean as perl critique - but me (and probably some others too) need to know what CPAN modules are OK, and how classify them...)
When doing development using several CPAN modules, and initially everything goes well but in the final testing, you find that some modules does not support utf8, and therefore part of your work is useless - that really can cause a bit disillusionment. :(
Edit:
I understand than all complicated things around the unicode has two roots:
unicode itself - as tchrist excellently analyzed some of problematic points
perl - simple can't break all working modules, live servers etc - so need maintaining backward compatibility.
My only hope: perl6. Is is an totally new and different language. Don't need maintaining any backward compatibility. So I hope, in perl6 will be default some things what is not possible do in perl5 and all utf8 things will be much more intuitive.
But, back to modules: #daxim told: "Authors won't even reveal whether their module is taint-safe, and this feature exists for decades!" - and this is a catastrophe. Maybe (big maybe, and honestly haven't idea how to do it), but maybe we arrived to the time, when need put much-much harder restrictions into CPAN submissions.
At one side i'm really very happy with volunteer works of CPAN authors. At the other side, publishing source code is not only like a free speech "right" - but should obey some rules too.
I understand, than is is nearly impossible make any "revolution", but we probably need some "evolution". Maybe flag any CPAN module what is not utf8 safe. Flag all what are not taint safe. Flag (like here in SO) what module does not meet the minimal coding standards and remove them. Maybe I'm an idealist and/or naive. :)
Chill, the situation is less dire than you're thinking. No one except tchrist operates on this level of Unicode correctness, also see Aristotle's recent commentary. As with all things, you get 80% of the way with 20% of the effort. This base effort, namely getting the topic of character encoding right, is well documented; and jrockway repeats it in his answer in that thread.
Replies to your specific questions:
No, there isn't. There is no concerted effort to collect this information in a central place. The Perl 5 wiki could be used to document problematic modules, Juerd already discusses some in uniadvice. I would really like to see a statement from each module author in their documentation that "this module DTRT w.r.t. encoding", but I don't see it happening. Authors won't even reveal whether their module is taint-safe, and this feature exists for decades!
encoding::warnings can be used to smoke out unintended upgrades. I mention it in the work-flow of Checklist for going the Unicode way with Perl
You can't do that with Perl::Critic or static analysis. I see no other way than knowledgable people poking at the module with pointy characters until it falls apart (or not), like mirod just commented.

What is the reason behind not having a simpler multi-line comment in Perl?

I know different methods of introducing multi-line comments in Perl. But I want to know why it doesn't have something simpler multi-line comment feature such /* comment */, which would make it much easier.
I currently follow http://www.perlmonks.org/?node_id=560985 to achieve multi-line comments. Are there any plans to include this feature in Perl anytime soon?
C-style comments come with a variety of difficult problems, including the inability to comment-out comments. Additionally, convenient syntax for one-line encourages concise writing, which makes your code easier to read. The absence of 10-line "Author: ... Input: ... Output: ... Favorite Color: ..." blocks that you see from time to time in C and C++ is another benefit.
Multi-line comments encourage long writing that is better expressed more concisely or as documentation, so this is what Perl encourages with its # and =pod operators (respectively).
Finally, if you are having trouble with Perl's style of commenting, you should configure your editor to edit programs, not text. In Emacs, if you start writing something like # this is a comment that is getting longer and longer and longer and oh my goodness we are out of space on this line what to do now and type M-q, Emacs will automatically insert the # at the beginning of each line that it creates.
If you just want to comment-out a block of code, then you need look no further than M-x comment-region and M-x uncomment-region, which will comment out the region in pretty-much any language; including Perl.
Don't stress about the syntax of comments; that's the computer's job!
There was a discussion regarding this on the perl6-language#perl.org mailing list. Although in the end the discussion was inconclusive, the summary posted here makes for interesting reading.
As with any multiline comment structure, there will be a "open" and a "close" comment condition, and that leads to problems with nested comments.
for example, C uses /* as the open and */ as the close. How does a multiline comment system handle comments within comments? C will fail if you try to comment a block that is already commented.
Note that line-based comments (e.g. c++ // comments) do not suffer this problem.
Simplicity is in the eye of the beholder. That said, there are already a number of ways to do multi-line comments. First, void string literals:
q{This text won't do anything.
Neither will this.};
This has the unfortunate side effect of triggering a warning:
Useless use of a constant in void context at - line 4.
Another option is using a heredoc in void context - for some reason this doesn't cause a warning:
<<ENDCOMMENT;
Foo comment bar.
ENDCOMMENT
As you can see, the problem isn't the lack of syntax as such (python doc comments look vaugely similar to the q{} method in fact) - it's more just that the community of perl programmers has settled on line comments using # and POD. Using a different method at this point will just confuse people who have to read your code later.

Why is it bad to put a space before a semicolon?

The perlstyle pod states
No space before the semicolon
and I can see no reason for that. I know that in English there should not be any space before characters made of 2 parts ( like '?',';','!' ), but I don't see why this should be a rule when writing Perl code.
I confess I personally use spaces before semicolons. My reason is that it makes the statement stands up a bit more clearer. I know it's not a very strong reason, but at least it's a reason.
print "Something\n with : some ; chars"; # good
print "Something\n with : some ; chars" ; # bad??
What's the reason for the second being bad?
From the first paragraph of the Description section:
Each programmer will, of course, have his or her own preferences in regards to formatting, but there are some general guidelines that will make your programs easier to read, understand, and maintain.
And from the third paragraph of the Description section:
Regarding aesthetics of code lay out, about the only thing Larry cares strongly about is that the closing curly bracket of a multi-line BLOCK should line up with the keyword that started the construct. Beyond that, he has other preferences that aren't so strong:
It's just a convention among Perl programmers for style. If you don't like it, you can choose to ignore it. I would compare it to Sun's Java Style guidelines or the suggestions for indenting in the K&R C book. Some environments have their own guidelines. These just happen to be the suggestions for Perl.
As Jon Skeet said in a deleted answer to this question:
If you're happy to be inconsistent with what some other people like, then just write in the most readable form for you. If you're likely to be sharing your code with others - and particularly if they'll be contributing code too - then it's worth trying to agree some consistent style.
This is only my opinion, but I also realize that people read code in different ways so "bad' is relative. If you are the only person who ever looks at your code, you can do whatever you like. However, having looked at a lot of Perl code, I've only seen a couple of people put spaces before statement separators.
When you are doing something that is very different from what the rest of the world is doing, the difference stands out to other people because their brain don't see it in the same way it. Conversely, doing things differently makes it harder for you to read other people's code for the same reason: you don't see the visual patterns you expect.
My standard is to avoid visual clutter, and that I should see islands of context. Anything that stands out draws attention, (as you say you want), but I don't need to draw attention to my statement separators because I usually only have one statement per line. Anything that I don't really need to see should fade into the visual background. I don't like semi-colons to stand out. To me, a semicolon is a minor issue, and I want to reduce the number of things my eyes see as distinct groups.
There are times where the punctuation is important, and I want those to stand out, and in that case the semicolon needs to get out of the way. I often do this with the conditional operator, for instance:
my $foo = $boolean ?
$some_long_value
:
$some_other_value
;
If you are a new coder, typing that damned statement separator might be a big pain in your life, but your pains will change over time. Later on, the style fad you choose to mitigate one pain becomes the pain. You'll get used to the syntax eventually. The better question might be, why don't they already stand out? If you're using a good programmer font that has heavier and bigger punctuation faces, you might have an easier time seeing them.
Even if you decide to do that in your code, I find it odd that people do it in their writing. I never really noticed it before Stackoverflow, but a lot of programmers here put spaces before most punctuation.
It's not a rule, it's one of Larry Wall's style preferences. Style preferences are about what help you and the others who will maintain your code visually absorb information quickly and accurately.
I agree with Larry in this case, and find the space before the semicolon ugly and disruptive to my reading process, but others such as yourself may find the exact opposite. I would, of course, prefer that you use the sort of style I like, but there aren't any laws on the books about it.
Yet.
Like others have said, this is a matter of style, not a hard and fast rule. For instance, I don't like four spaces for indentation. I am a real tab for block level indentation/spaces for lining things up sort of programmer, so I ignore that section of perlstyle.
I also require a reason for style. If you cannot clearly state why you prefer a given style then the rule is pointless. In this case the reason is fairly easy to see. Whitespace that is not required is used to call attention to something or make something easier to read. So, does a semicolon deserve extra attention? Every expression (barring control structures) will end with a semicolon, and most expressions fit on one line. So calling attention to the expected case seems to be a waste of a programmers time and attention. This is why most programmers indent a line that is a continuation of an expression to call attention to the fact that it didn't end on one line:
open my $fh, "<", $file
or die "could not open '$file': $!";
Now, the second reason we use whitespace is make something easier to read. Is
foo("bar") ;
easier to read than
foo("bar");
I would make the claim that is harder to read because it is calling my attention to the semicolon, and I, for the most part, don't care about semicolons if the file is formatted correctly. Sure Perl cares, and if I am missing one it will tell me about it.
Feel free to put a space. The important thing is that you be consistent; consistency allows you to more readily spot errors.
There's one interesting coding style that places , and ; at the beginning of the following line (after indentation). While that's not to my taste, it works so long as it is consistent.
Update: an example of that coding style (which I do not advocate):
; sub capture (&;*)
{ my $code = shift
; my $fh = shift || select
; local $output
; no strict 'refs'
; if ( my $to = tied *$fh )
{ my $tc = ref $to
; bless $to, __PACKAGE__
; &{$code}()
; bless $to, $tc
}
else
{ tie *$fh , __PACKAGE__
; &{$code}()
; untie *$fh
}
; \$output
}
A defense can be found here: http://perl.4pro.net/pcs.html.
(2011 Update: that page seems to have gone AWOL; a rescued copy can be seen here: http://ysth.info/pcs.html)
Well, it's style, not a rule. Style rules are by definition fairly arbitrary. As for why you shouldn't put spaces before semicolons, it's simply because that's The Way It's Done. Not just with Perl, but with C and all the other curlies-and-semicolons languages going back to C and newer C-influenced ones like C++, C#, Objective C, Javascript, Java, PHP, etc.
Because people don't expect it . It looks strange when you do it .
The reason I would cite is to be consistent within a project. I have been in a project where the majority of programmers would not insert the space but one programmer does. If he works on a defect he may routinely add the space in the lines of code he is examining as that is what he likes and there is nothing in the style guide to say otherwise.
The visual diff tools in use can't determine this so show a mass of line changes when only one may have changed and becomes difficult to review. (OK this could be an argument for a better diff tools but embedded work tends to restrict tool choice more).
Maybe a suitable guide for this would be to choose whichever format you want for you semicolon but don't change others unless you modify the statement itself.
I really don't like it. But it's 100% a personal decision and group convention you must make.
Code style is just a set of rules to make reading and maintaining code easier.
There is no real bad style, but some are more accepted than others. And they are of course the source for some "religious battles" (to call curly braces style ;-) ).
To make a real life comparison, we have trafic lights with red, yellow/orange and green. Despite the psychological effects of the colors, it is not wrong to use Purple, Brown and Pink, but just because we are all used to the colors there are less trafic accidents.