Why do Perl control statements require braces? - perl

This may look like the recent question that asked why Perl doesn't allow one-liners to be "unblocked," but I found the answers to that question unsatisfactory because they either referred to the syntax documentation that says that braces are required, which I think is just begging the question, or ignored the question and simply gave braceless alternatives.
Why does Perl require braces for control statements like if and for? Put another way, why does Perl require blocks rather than statements, like some other popular languages allow?

One reason could be that some styles dictate that you should always use braces with control structures, even for one liners, in order to avoid breaking them later, e.g.:
if (condition)
myObject.doSomething();
else
myObject.doSomethingElse();
Then someone adds something more to the first part:
if (condition)
myObject.doSomething();
myObject.doSomethingMore(); // Syntax error next line
else
myObject.doSomethingElse();
Or worse:
if (condition)
myObject.doSomething();
else
myObject.doSomethingElse();
myObject.doSomethingMore(); // Compiles, but not what you wanted.
In Perl, these kinds of mistakes are not possible, because not using braces with control structures is always a syntax error. In effect, a style decision has been enforced at the language syntax level.
Whether that is any part of the real reason, only Larry's moustache knows.

One reason could be that some constructs would be ambiguous without braces :
foreach (#l) do_something unless $condition;
Does unless $condition apply to the whole thing or just the do_something statement?
Of course this could have been worked out with priority rules or something,
but it would have been yet another way to create confusing Perl code :-)

One problem with braceless if-else clauses is they can lead to syntactic ambiguity:
if (foo)
if (bar)
mumble;
else
tumble;
Given the above, under what condition is tumble executed? It could be interpreted as happening when !foo or foo && !bar. Adding braces clears up the ambiguity without dirtying the source too much. You could then go on to say that it's always a good idea to have the braces, so let's make the language require it and solve the endless C bickering over whether they should be used or not. Or, of course, you could address the problem by getting rid of the braces completely and using the indentation to indicate nesting. Both are ways of making clear, unambiguous code a natural thing rather than requiring special effort.

In Programming Perl (which Larry Wall co-authored), 3rd Edition, page 113, compound statements are defined in terms of expressions and blocks, not statements, and blocks have braces.
Note that unlike in C and Java,
[compound statements] are defined in
terms of BLOCKS, not statements.
This means that the braces are
requried--no dangling statements
allowed.
I don't know if that answers your question but it seems like in this case he chose to favor a simple language structure instead of making exceptions.

Perhaps not directly relevant to your question about (presumably) Perl 5 and earlier, but…
In Perl 6, control structures do not require parentheses:
if $x { say '$x is true' }
for <foo bar baz> -> $s { say "[$s]" }
This would be horrendously ambiguous if the braces were also optional.

Isn't it that Perl allows you to skip the braces, but then you have to write statement before condition? i.e.
#!/usr/bin/perl
my $a = 1;
if ($a == 1) {
print "one\n";
}
# is equivalent to:
print "one\n" if ($a == 1);

"Okay, so normally, you need braces around blocks, but not if the block is only one statement long, except, of course, if your statement would be ambiguous in a way that would be ruled by precedence rules not like you want if you omitted the braces -- in this case, you could also imagine the use of parentheses, but that would be inconsistent, because it is a block after all -- this is of course dependent on the respective precedence of the involved operators. In any case, you don't need to put semicolons after closing braces -- it is even wrong if you end an if statement that is followed by an else statement -- except that you absolutely must put a semicolon at the end of a header file in C++ (or was it C?)."
Seriously, I am glad for every explicitness and uniformity in code.

Just guessing here, but "unblocked" loops/ifs/etc. tend to be places where subtle bugs are introduced during code maintenance, since a sloppy maintainer might try to add another line "inside the loop" without realizing that it's not really inside.
Of course, this is Perl we're talking about, so probably any argument that relies on maintainability is suspect... :)

Related

Safe.pm base_math as math calculator and new variables

I do not want to write my own recursive-descent math parser or think too deeply about grammar, so I am (re-)using the Perl module Safe.pm as an arithmetic calculator with variables. My task is to let one anonymous web user A type into a textfield a couple of math expressions, like:
**Input Formula:** $x= 2; $y=sqrt(2*$x+(25+$x)*$x); $z= log($y); ...
Ideally, this should only contain math expressions, but not generic Perl code. Later, I want to use it for web user B:
**Input Print:** you start with x=$x and end with z=$z . you don't know $a.
to <pre> text output that looks like this:
**Output Txt:** you start with x=2 and end with z=2.03 . you don't know $a.
(The fact that $a was not replaced is its own warning.) Ideally, I want to check that my web users have not only not tried to break in, but also have made no syntax errors.
My current Safe.pm-based implementation has drawbacks:
I want only math expressions in the first textfield. Alas, :base_math only extends Safe.pm beyond :base_core, so I have to live with the user having access to more than just math algebra expressions. For example, the web users could accidentally try to use a Perl reserved name, define subs, or do who knows what. Is there a better solution that picks off only the recursive descent math grammar parser? (and, subs like system() should not be permitted math functions!)
For the printing, I can just wrap a print "..." around the text and go another Safe eval, but this replaces $a with undef. What I really mean my code to do is to go through the table of newly added variables ($x, $y, and $z) and if they appear unescaped, then replace them; others should be ignored. I also have to watch carefully here that my guys are not working together to try to escape and type text like "; system("rm -rf *"); print ";, though Safe would catch this particular issue. More likely, A could try to inject some nasty JavaScript for B or who knows what.
Questions:
Is Safe.pm the right tool for the job? Perl seems like a heavy cannon here, but not having to reinvent the wheel is nice.
Can one further restrict Safe.pm to Perl's arithmetic only?
Is there a "new symbols" table that I can iterate over for substitution?
Safe.pm seems like a bad choice, because you're going to run the risk of overlooking some exploitable operation. I would suggest looking at a parsing tool, such as Marpa. It even has the beginnings of a calculator implementation which you could probably adapt to your purposes.

Why doesn't this run forever?

I was looking at a rather inconclusive question about whether it is best to use for(;;) or while(1) when you want to make an infinite loop and I saw an interesting solution in C where you can #define "EVER" as a constant equal to ";;" and literally loop for(EVER).
I know defining an extra constant to do this is probably not the best programming practice but purely for educational purposes I wanted to see if this could be done with Perl as well.
I tried to make the Perl equivalent, but it only loops once and then exits the loop.
#!/usr/bin/perl -w
use strict;
use constant EVER => ';;';
for (EVER) {
print "FOREVER!\n";
}
Output:
FOREVER!
Why doesn't this work in perl?
C's pre-processor constants are very different from the constants in most languages.
A normal constant acts like a variable which you can only set once; it has a value which can be passed around in most of the places a variable can be, with some benefits from you and the compiler knowing it won't change. This is the type of constant that Perl's constant pragma gives you. When you pass the constant to the for operator, it just sees it as a string value, and behaves accordingly.
C, however, has a step which runs before the compiler even sees the code, called the pre-processor. This actually manipulates the text of your source code without knowing or caring what most of it means, so can do all sorts of things that you couldn't do in the language itself. In the case of #DEFINE EVER ;;, you are telling the pre-processor to replace every occurrence of EVER with ;;, so that when the actual compiler runs, it only sees for(;;). You could go a step further and define the word forever as for(;;), and it would still work.
As mentioned by Andrew Medico in comments, the closest Perl has to a pre-processor is source filters, and indeed one of the examples in the manual is an emulation of #define. These are actually even more powerful than pre-processor macros, allowing people to write modules like Acme::Bleach (replaces your whole program with whitespace while maintaining functionality) and Lingua::Romana::Perligata (interprets programs written in grammatically correct Latin), as well as more sensible features such as adding keywords and syntax for class and method declarations.
It doesn't run forever because ';;' is an ordinary string, not a preprocessor macro (Perl doesn't have an equivalent of the C preprocessor). As such, for (';;') runs a single time, with $_ set to ';;' that one time.
Andrew Medico mentioned in his comment that you could hack it together with a source filter.
I confirmed this, and here's an example.
use Filter::cpp;
#define EVER ;;
for (EVER) {
print "Forever!\n";
}
Output:
Forever!
Forever!
Forever!
... keeps going ...
I don't think I would recommend doing this, but it is possible.
This is not possible in Perl. However, you can define a subroutine named forever which takes a code block as a parameter and runs it again and again:
#!/usr/bin/perl
use warnings;
use strict;
sub forever (&) {
$_[0]->() while 1
}
forever {
print scalar localtime, "\n";
sleep 1;
};

Coffeescript whitespace, arguments and function scopes

I've been using CoffeeScript for a while. I find it a good language overall, certainly better than plain JS, but I find I'm still baffled by its indentation rules. Take this example:
Bacon.mergeAll(
#searchButton.asEventStream('click')
#searchInput.asEventStream('keyup')
.filter (e) => e.keyCode is 13
)
.map =>
#searchInput.val()
.flatMapLatest (query) =>
Bacon.fromPromise $.ajax
url: #searchURL + encodeURI query
dataType: 'jsonp'
This does what it should (the code is based on this tutorial, btw) but it took me a lot of trial and error to get it right.
Why do mergeAll and asEventStream require parentheses around their arguments? Why is indentation not enough to determine where their argument lists begin and end? OTOH, why is indentation enough for map and flatMapLatest? Why is the whitespace before a hanging method, such as .filter (its indentation level) not enough to determine what it binds to? It seems to be completely ignored.
Is there a definitive guide to this language's indentation rules? I never had a problem understanding Python syntax at a glance, even with very complex nesting, so it's not an issue with indentation-based syntax per se.
Indentation in CoffeeScript generally defines blocks, and argument lists aren't (necessarily) blocks. Similarly, a chained function call isn't a block; CoffeeScript simply sees a line starting with . and connects it to the previous line of similar or lower indentation.
Hence, the parentheses are needed for asEventStream, since CoffeeScript would otherwise see:
#searchInput.asEventStream 'keyup'.filter (e) => e.keyCode is 13
Which would call filter on the 'keyup' string, and it'd remain ambiguous whether the function is an argument to filter, or an argument to #searchInput.asEventStream('keyup'.filter)(). That last bit obviously doesn't make much sense, but CoffeeScript isn't a static analyzer, so it doesn't know that.
A function, meanwhile, is a block, hence the function argument to .map() works without parentheses, since it clearly delimited by its indentation. I.e. the line following the function has less indentation.
Personally, I'd probably write
Bacon.mergeAll(
#searchButton.asEventStream('click'), # explicit comma
#searchInput.asEventStream('keyup').filter (e) -> e.keyCode is 13 # no need for =>
)
.map(=> #searchInput.val()) # maybe not as pretty, but clearer
.flatMapLatest (query) =>
Bacon.fromPromise $.ajax
url: #searchURL + encodeURI query
dataType: 'jsonp'
In fact, I might break it up into separate expressions to make it clearer still. Insisting on the syntactic sugar while chaining stuff can indeed get confusing in CoffeeScript, but remember that you're not obliged to use it. Same as you're not obliged to always avoid parentheses; if they make things clearer, by all means use 'em!
If the code's easier to write, less ambiguous to read, and simpler to maintain without complex chaining/syntax (all of which seems true for this example), then I'd say just skip it.
In the end, there just are combinations of indentation syntax in CoffeeScript that can make either you or the compiler trip. Mostly, though, if you look at something, and find it straightforward, the compiler probably thinks so too. If you're in doubt, the compiler might be too, or it'll interpret it in unexpected ways. That's the best I can offer in terms of "definitive guide" (don't know of a written one).
Have you looked at the Javascript produced by this code? What happens when you omit ().
In Try Coffeescript I find that:
#searchButton.asEventStream 'click'
is ok. The second asEventStream compiles to:
this.searchInput.asEventStream('keyup').filter(function(e) {
but omitting the () changes it to:
this.searchInput.asEventStream('keyup'.filter(function(e) {
filter is now an attribute of 'keyup'. Putting a space to separate asEventStream and ('keyup') does the same thing.
#searchInput.asEventStream ('keyup')
As written .mergeAll() produces:
Bacon.mergeAll(...).map(...).flatMapLatest(...);
Omitting the ()
Bacon.mergeAll
#searchButton.asEventStream('click')
#searchInput.asEventStream('keyup')
gives an error because the compiler has no way of knowing that mergeAll is a function that takes arguments. It has no reason to expect an indented block.
Coming from Python, my inclination is to continue to use (),[],{} to mark structures like arguments, arrays and objects, unless the code is clearer without them. Often they help me read the code, even if the compiler does not need them. Coffeescript is also like Python in the use of indentation to denote code blocks (as opposed to the {} used in Javascript and other C styled languages).

Is this trivial function silly?

I came across a function today that made me stop and think. I can't think of a good reason to do it:
sub replace_string {
my $string = shift;
my $regex = shift;
my $replace = shift;
$string =~ s/$regex/$replace/gi;
return $string;
}
The only possible value I can see to this is that it gives you the ability to control the default options used with a substitution, but I don't consider that useful. My first reaction upon seeing this function get called is "what does this do?". Once I learn what it does, I am going to assume it does that from that point on. Which means if it changes, it will break any of my code that needs it to do that. This means the function will likely never change, or changing it will break lots of code.
Right now I want to track down the original programmer and beat some sense into him or her. Is this a valid desire, or am I missing some value this function brings to the table?
The problems with that function include:
Opaque: replace_string doesn't tell you that you're doing a case-insensitive, global replace without escaping.
Non-idiomatic: $string =~ s{$this}{$that}gi is something you can learn what it means once, and its not like its some weird corner feature. replace_string everyone has to learn the details of, and its going to be different for everyone who writes it.
Inflexible: Want a non-global search-and-replace? Sorry. You can put in some modifiers by passing in a qr// but that's far more advanced knowledge than the s/// its hiding.
Insecure: A user might think that the function takes a string, not a regex. If they put in unchecked user input they are opening up a potential security hole.
Slower: Just to add the final insult.
The advantages are:
Literate: The function name explains what it does without having to examine the details of the regular expression (but it gives an incomplete explanation).
Defaults: The g and i defaults are always there (but that's non-obvious from the name).
Simpler Syntax: Don't have to worry about the delimiters (not that s{}{} is difficult).
Protection From Global Side Effects: Regex matches set a salad of global variables ($1, $+, etc...) but they're automatically locally scoped to the function. They won't interfere if you're making use of them for another regex.
A little overzealous with the encapsulation.
print replace_string("some/path", "/", ":");
Yes, you get some magic in not having to replace / with a different delimiter or escape / in the regex.
If it's just a verbose replacement for s/// then I'd guess that it was written by someone who came to Perl from a language where using regular expressions required extra syntax and who is/was more comfortable coding that way. If that's the case I'd classify it as Perl baby-talk: silly and awkward to seasoned coders but not bad -- not bad enough to warrant a beating, anyway. ;)
If I squint really hard I can almost see cases where such a function might be useful: applying a bunch of patterns to a bunch of strings, allowing user input for the terms, supplying a CODE reference for a callback...
My first reaction upon seeing that is a new Perl programmer didn't want to remember the syntax for a regular expression and created a function he or she could easily remember, without learning the syntax.
The only reason I can see other than the ones mentioned already ( new programmer does not want to remember regex syntax ) is that it is possible they may be using some IDE that does not have any syntax highlighting for regex, but it does exist for functions they've written. Not the best of reasons, but plausible.

Usage of ‘if’ versus ‘unless’ for Perl conditionals

What are some guidelines for the best use of if versus unless in Perl code? Are there strong reasons to prefer one or the other in some situations?
In Perl Best Practices, the advice is to never use unless. Personally, I think that's lunacy.
I use unless whenever there's a simple condition that I would otherwise write as if( ! ... ). I find the unless version to be more readable, especially when used as a postfix:
do_something() unless $should_not_do_that;
I recommend avoiding unless anytime things get more complicated, such as when you will have elsif or else blocks. (Fortunately, or perhaps unfortunately, depending on your perspective, there is no elsunless)
Also, any time a conditional is a complex expression made up of other booleans. For example,
unless( $foo and !$bar )
Is pretty damn confusing, and does not have any advantage over the equivalent if.
Aside from one esoteric case1 unless is just syntactic sugar for if !. It exists to allow you to write code that is clearer and more expressive. It should be used when it achieves this goal and shunned when it impairs it.
I find unless to be most useful for flow control in loops. e.g.
while (<$fh>) {
next unless /\S/;
# ...
}
For simple negations I find it clearer than a negated if -- it's easy to miss that leading ! when reading code.
unless ($condition) {
do_something();
}
if (!$condition) {
do_something();
}
But don't write unless ... else, because that's just jarring.
In postfix form it provides a hint about what the expected path through the code is.
do_normal_thing() unless $some_unlikely_condition;
1) The last expression evaluated is different, which can affect the behavior of subs without an explicit return.
A rule of thumb is that 'unless' should probably be used infrequently.
It is particularly useful in the postfix form, e.g.:
delete_old_widgets() unless $old_widget_count == 0
Situations where you should never use unless:
with a compound condition (and, or, not)
with an else clause
I spent an hour recently trying to explain to someone how two nested 'unless' clauses worked, it was difficult to decypher without having to invert them into if statements with boolean logic.
If you try to convert it into English, it helps to guide you.
A simple unless works fine. For example.
'Unless you are quiet, I am going to ignore you'.
unless ($quiet) {
ignore();
}
Although I think this works equally well
'If you are not quiet, I am going to ignore you'
if (not $quiet) {
ignore();
}
when it starts getting complicated is when you have negation.
'Unless you are not noisy, I am going to ignore you'
unless ( ! $noisy) {
ignore();
}
Far better written as
'If you are noisy, I am going to ignore you'
if ($noisy) {
ignore();
}
So, don't use an 'unless' if you also have a negate.
Also don't use 'unless else'
unless ($quiet) {
ignore();
}
else {
give_a_sweet();
}
'Unless you are quiet, I will ignore you, otherwise I will give you a sweet'
Change it by inverting the condition.
if ($quiet) {
give_a_sweet();
}
else {
ignore();
}
'If you are quiet, I will give you a sweet, otherwise I will ignore you'.
with more than one condition, it get's messy.
unless ($quiet and not $fidgit) {
punish();
}
'Unless you are quiet and you don't fidgit, I will punish you'.
(sorry my comprehension is failing here!)
again, negate it.
if (not $quiet or $fidgit) {
punish();
}
'If you are not quiet, or you fidgit, I will punish you'.
the problem with using 'unless' even for the simplest cases, they often (by yourself or by outhe
I hope that makes it clear when you should, or should not, use unless?
(unless you don't have another opinion?)
Though I understand the motivations for these sorts of questions I don't believe it's really sensible to boil down the use of something like unless into concrete rules of thumb. This, like a lot of auxiliary syntax in Perl is provided as a minor convenience for programmers to help them articulate themselves clearer at their own discretion; I've seen as much said in more ways than one by the lead developers throughout "Programming Perl". There's no higher purpose or rationalization. I'm well aware this finesses the question, but the only constraint I'd impose on using it is to see that it serves its broader purpose of making the code clearer. If it does, then all is well. Recognizing whether code is understandable is intuitive in itself and can't be reduced to a large handful of overly generalized usage conditions concerning every nuance of the built-in modifiers/operators/overall syntax, and where constraints and guidelines are needed in large group projects I don't think it would make sense to split hairs this finely.
My opinion would be to never use unless. My reasons are:
I think the syntax makes code more difficult to read. Having a single method of doing an if makes thing simpler and more consistent.
If you need to add an else statement later on you should really change the unless to an if. It's just easier if it is already an if.
If the logic inside the unless statement becomes more complex then you can end up with odd code like "unless (x == 5 && y != 7). This is odd because there is a double negative on the second check.
There are other ways to negate things, ie x != 5
It's more consistent with other languages. I don't know of any other language that has an unless statement and I think there is a very good reason for this.
In perl there are really 4 ways to write an if statement, if, unless and then putting the check at the end of the line instead of the start. I much prefer a single consistent method that is consistent with other langauges also.
Just my $0.02 worth.
Apropos...
In Perl Best Practices, the advice is to never use unless. Personally,
I think that's lunacy.
After 20+ years preferring Perl over any of its alternatives (most of which would not exist had not Perl provided the template), I not only concur with the 'lunacy' verdict, I'm surprised (concerned) to hear that 'Best Practices' wants to get rid of it.
That said, I strongly prefer code written for clarity, as opposed to the implicitly obfuscated alternatives some Perl programmers adopt 'just because they can'. 'unless' is the unambiguous opposite of 'if', and therefore a very useful alternative to embedding a negation in an 'if' conditional, particularly if the conditional contains multiple operators. And that rationale applies even if followed by else/elsif.
Just a personal opinion maybe, but I like to use Unless when the if condition would start with a !
The syntax if(!$condition) is equivalent to unless($condition), and likewise if you swap the NOT'ing of the condition.
I personally prefer using only IF statements. Why? Because it's less to memorize. If you have 100 lines of code, half of it using unless() and the other half using if(), you'll need to spend more time debugging it than if it was just if() statements.
You may "get" the hang of unless and if, though, so that going between the two takes no time at all. But it's not just about if() versus unless(). Don't forget that...
if($condition) {
#passed-condition code
} else {
#failed-condition code
}
...is equivalent to...
unless(!$condition) {
#passed-condition code
} else {
#failed-condition code
}
But what about if(){...}elsif(){...}? How would you make an equivalent of that into unless(){...}elsunless(){...}? The more complicated the logic, the more difficult it becomes to go between if() and unless(). The less you have to remember and balance in your memory, the quicker you will be to debug your own code.