I came across a function today that made me stop and think. I can't think of a good reason to do it:
sub replace_string {
my $string = shift;
my $regex = shift;
my $replace = shift;
$string =~ s/$regex/$replace/gi;
return $string;
}
The only possible value I can see to this is that it gives you the ability to control the default options used with a substitution, but I don't consider that useful. My first reaction upon seeing this function get called is "what does this do?". Once I learn what it does, I am going to assume it does that from that point on. Which means if it changes, it will break any of my code that needs it to do that. This means the function will likely never change, or changing it will break lots of code.
Right now I want to track down the original programmer and beat some sense into him or her. Is this a valid desire, or am I missing some value this function brings to the table?
The problems with that function include:
Opaque: replace_string doesn't tell you that you're doing a case-insensitive, global replace without escaping.
Non-idiomatic: $string =~ s{$this}{$that}gi is something you can learn what it means once, and its not like its some weird corner feature. replace_string everyone has to learn the details of, and its going to be different for everyone who writes it.
Inflexible: Want a non-global search-and-replace? Sorry. You can put in some modifiers by passing in a qr// but that's far more advanced knowledge than the s/// its hiding.
Insecure: A user might think that the function takes a string, not a regex. If they put in unchecked user input they are opening up a potential security hole.
Slower: Just to add the final insult.
The advantages are:
Literate: The function name explains what it does without having to examine the details of the regular expression (but it gives an incomplete explanation).
Defaults: The g and i defaults are always there (but that's non-obvious from the name).
Simpler Syntax: Don't have to worry about the delimiters (not that s{}{} is difficult).
Protection From Global Side Effects: Regex matches set a salad of global variables ($1, $+, etc...) but they're automatically locally scoped to the function. They won't interfere if you're making use of them for another regex.
A little overzealous with the encapsulation.
print replace_string("some/path", "/", ":");
Yes, you get some magic in not having to replace / with a different delimiter or escape / in the regex.
If it's just a verbose replacement for s/// then I'd guess that it was written by someone who came to Perl from a language where using regular expressions required extra syntax and who is/was more comfortable coding that way. If that's the case I'd classify it as Perl baby-talk: silly and awkward to seasoned coders but not bad -- not bad enough to warrant a beating, anyway. ;)
If I squint really hard I can almost see cases where such a function might be useful: applying a bunch of patterns to a bunch of strings, allowing user input for the terms, supplying a CODE reference for a callback...
My first reaction upon seeing that is a new Perl programmer didn't want to remember the syntax for a regular expression and created a function he or she could easily remember, without learning the syntax.
The only reason I can see other than the ones mentioned already ( new programmer does not want to remember regex syntax ) is that it is possible they may be using some IDE that does not have any syntax highlighting for regex, but it does exist for functions they've written. Not the best of reasons, but plausible.
Related
I do not want to write my own recursive-descent math parser or think too deeply about grammar, so I am (re-)using the Perl module Safe.pm as an arithmetic calculator with variables. My task is to let one anonymous web user A type into a textfield a couple of math expressions, like:
**Input Formula:** $x= 2; $y=sqrt(2*$x+(25+$x)*$x); $z= log($y); ...
Ideally, this should only contain math expressions, but not generic Perl code. Later, I want to use it for web user B:
**Input Print:** you start with x=$x and end with z=$z . you don't know $a.
to <pre> text output that looks like this:
**Output Txt:** you start with x=2 and end with z=2.03 . you don't know $a.
(The fact that $a was not replaced is its own warning.) Ideally, I want to check that my web users have not only not tried to break in, but also have made no syntax errors.
My current Safe.pm-based implementation has drawbacks:
I want only math expressions in the first textfield. Alas, :base_math only extends Safe.pm beyond :base_core, so I have to live with the user having access to more than just math algebra expressions. For example, the web users could accidentally try to use a Perl reserved name, define subs, or do who knows what. Is there a better solution that picks off only the recursive descent math grammar parser? (and, subs like system() should not be permitted math functions!)
For the printing, I can just wrap a print "..." around the text and go another Safe eval, but this replaces $a with undef. What I really mean my code to do is to go through the table of newly added variables ($x, $y, and $z) and if they appear unescaped, then replace them; others should be ignored. I also have to watch carefully here that my guys are not working together to try to escape and type text like "; system("rm -rf *"); print ";, though Safe would catch this particular issue. More likely, A could try to inject some nasty JavaScript for B or who knows what.
Questions:
Is Safe.pm the right tool for the job? Perl seems like a heavy cannon here, but not having to reinvent the wheel is nice.
Can one further restrict Safe.pm to Perl's arithmetic only?
Is there a "new symbols" table that I can iterate over for substitution?
Safe.pm seems like a bad choice, because you're going to run the risk of overlooking some exploitable operation. I would suggest looking at a parsing tool, such as Marpa. It even has the beginnings of a calculator implementation which you could probably adapt to your purposes.
Is there a good reason for map to not read from #_ (in functions) or #ARGV (anywhere else) when not given an argument list?
I can't say why Larry didn't make map, grep and the other list functions operate on #_ like pop and shift do, but I can tell you why I wouldn't. Default variables used to be in vogue, but Perl programmers have discovered that most of the "default" behaviors cause more problems than they solve. I doubt they would make it into the language today.
The first problem is remembering what a function does when passed no arguments. Does it act on a hidden variable? Which one? You just have to know by rote, and that makes it a lot more work to learn, read and write the language. You're probably going to get it wrong and that means bugs. This could be mitigated by Perl being consistent about it (ie. ALL functions which take lists operate on #_ and ALL functions which take scalars operate on $_) but there's more problems.
The second problem is the behavior changes based on context. Take some code outside of a subroutine, or put it into a subroutine, and suddenly it works differently. That makes refactoring harder. If you made it work on just #_ or just #ARGV then this problem goes away.
Third is default variables have a tendency to be quietly modified as well as read. $_ is dangerous for this reason, you never know when something is going to overwrite it. If the use of #_ as the default list variable were adopted, this behavior would likely leak in.
Fourth, it would probably lead to complicated syntax problems. I'd imagine this was one of the original reasons keeping it from being added to the language, back when $_ was in vogue.
Fifth, #ARGV as a default makes some sense when you're writing scripts that primarily work with #ARGV... but it doesn't make any sense when working on a library. Perl programmers have shifted from writing quick scripts to writing libraries.
Sixth, using $_ as default is a way of chaining together scalar operations without having to write the variable over and over again. This might have been mitigated if Perl was more consistent about its return values, and if regexes didn't have special syntax, but there you have it. Lists can already be chained, map { ... } sort { ... } grep /.../, #foo, so that use case is handled by a more efficient mechanism.
Finally, it's of very limited use. It's very rare that you want to pass #_ to map and grep. The problems with hidden defaults are far greater than avoiding typing two characters. This space savings might have slightly more sense when Perl was primarily for quick and dirty work, but it makes no sense when writing anything beyond a few pages of code.
PS shift defaulting to #_ has found a niche in my $self = shift, but I find this only shines because Perl's argument handling is so poor.
The map function takes in a list, not an array. shift takes an array. With lists, on the other hand, #_/#ARGV may or may not be fair defaults.
I've seen many (code-golf) Perl programs out there and even if I can't read them (Don't know Perl) I wonder how you can manage to get such a small bit of code to do what would take 20 lines in some other programming language.
What is the secret of Perl? Is there a special syntax that allows you to do complex tasks in few keystrokes? Is it the mix of regular expressions?
I'd like to learn how to write powerful and yet short programs like the ones you know from the code-golf challenges here. What would be the best place to start out? I don't want to learn "clean" Perl - I want to write scripts even I don't understand anymore after a week.
If there are other programming languages out there with which I can write even shorter code, please tell me.
There are a number of factors that make Perl good for code golfing:
No data typing. Values can be used interchangeably as strings and numbers.
"Diagonal" syntax. Usually referred to as TMTOWTDI (There's more than one way to do it.)
Default variables. Most functions act on $_ if no argument is specified. (A few act
on #_.)
Functions that take multiple arguments (like split) often have defaults that
let you omit some arguments or even all of them.
The "magic" readline operator, <>.
Higher order functions like map and grep
Regular expressions are integrated into the syntax (i.e. not a separate library)
Short-circuiting operators return the last value tested.
Short-circuiting operators can be used for flow control.
Additionally, without strictures (which are off be default):
You don't need to declare variables.
Barewords auto-quote to strings.
undef becomes either 0 or '' depending on context.
Now that that's out of the way, let me be very clear on one point:
Golf is a game.
It's great to aspire to the level of perl-fu that allows you to be good at it, but in the name of $DIETY do not golf real code. For one, it's a horrible waste of time. You could spend an hour trying to trim out a few characters. Golfed code is fragile: it almost always makes major assumptions and blithely ignores error checking. Real code can't afford to be so careless. Finally, your goal as a programmer should be to write clear, robust, and maintainable code. There's a saying in programming: Always write your code as if the person who will maintain it is a violent sociopath who knows where you live.
So, by all means, start golfing; but realize that it's just playing around and treat it as such.
Most people miss the point of much of Perl's syntax and default operators. Perl is largely a "DWIM" (do what I mean) language. One of it's major design goals is to "make the common things easy and the hard things possible".
As part of that, Perl designers talk about Huffman coding of the syntax and think about what people need to do instead of just giving them low-level primitives. The things that you do often should take the least amount of typing, and functions should act like the most common behavior. This saves quite a bit of work.
For instance, the split has many defaults because there are some use cases where leaving things off uses the common case. With no arguments, split breaks up $_ on whitespace because that's a very common use.
my #bits = split;
A bit less common but still frequent case is to break up $_ on something else, so there's a slightly longer version of that:
my #bits = split /:/;
And, if you wanted to be explicit about the data source, you can specify the variable too:
my #bits = split /:/, $line;
Think of this as you would normally deal with life. If you have a common task that you perform frequently, like talking to your bartender, you have a shorthand for it the covers the usual case:
The usual
If you need to do something, slightly different, you expand that a little:
The usual, but with onions
But you can always note the specifics
A dirty Bombay Sapphire martini shaken not stirred
Think about this the next time you go through a website. How many clicks does it take for you to do the common operations? Why are some websites easy to use and others not? Most of the time, the good websites require you to do the least amount of work to do the common things. Unlike my bank which requires no fewer than 13 clicks to make a credit card bill payment. It should be really easy to give them money. :)
This doesn't answer the whole question, but in regards to writing code you won't be able to read in a couple days, here's a few languages that will encourage you to write short, virtually unreadable code:
J
K
APL
Golfscript
Perl has a lot of single character special variables that provide a lot of shortcuts eg $. $_ $# $/ $1 etc. I think it's that combined with the built in regular expressions, allows you to write some very concise but unreadable code.
Perl's special variables ($_, $., $/, etc.) can often be used to make code shorter (and more obfuscated).
I'd guess that the "secret" is in providing native operations for often repeated tasks.
In the domain that perl was originally envisioned for you often have to
Take input linewise
Strip off whitespace
Rip lines into words
Associate pairs of data
...
and perl simple provided operators to do these things. The short variable names and use of defaults for many things is just gravy.
Nor was perl the first language to go this way. Many of the features of perl were stolen more-or-less intact (or often slightly improved) from sed and awk and various shells. Good for Larry.
Certainly perl wasn't the last to go this way, you'll find similar features in python and php and ruby and ... People liked the results and weren't about to give them up just to get more regular syntax.
What's Java's secret of copying a variable in only one line, without worrying about buses and memory? Answer: the code is transformed to bigger code. Same for every language ever invented.
This may look like the recent question that asked why Perl doesn't allow one-liners to be "unblocked," but I found the answers to that question unsatisfactory because they either referred to the syntax documentation that says that braces are required, which I think is just begging the question, or ignored the question and simply gave braceless alternatives.
Why does Perl require braces for control statements like if and for? Put another way, why does Perl require blocks rather than statements, like some other popular languages allow?
One reason could be that some styles dictate that you should always use braces with control structures, even for one liners, in order to avoid breaking them later, e.g.:
if (condition)
myObject.doSomething();
else
myObject.doSomethingElse();
Then someone adds something more to the first part:
if (condition)
myObject.doSomething();
myObject.doSomethingMore(); // Syntax error next line
else
myObject.doSomethingElse();
Or worse:
if (condition)
myObject.doSomething();
else
myObject.doSomethingElse();
myObject.doSomethingMore(); // Compiles, but not what you wanted.
In Perl, these kinds of mistakes are not possible, because not using braces with control structures is always a syntax error. In effect, a style decision has been enforced at the language syntax level.
Whether that is any part of the real reason, only Larry's moustache knows.
One reason could be that some constructs would be ambiguous without braces :
foreach (#l) do_something unless $condition;
Does unless $condition apply to the whole thing or just the do_something statement?
Of course this could have been worked out with priority rules or something,
but it would have been yet another way to create confusing Perl code :-)
One problem with braceless if-else clauses is they can lead to syntactic ambiguity:
if (foo)
if (bar)
mumble;
else
tumble;
Given the above, under what condition is tumble executed? It could be interpreted as happening when !foo or foo && !bar. Adding braces clears up the ambiguity without dirtying the source too much. You could then go on to say that it's always a good idea to have the braces, so let's make the language require it and solve the endless C bickering over whether they should be used or not. Or, of course, you could address the problem by getting rid of the braces completely and using the indentation to indicate nesting. Both are ways of making clear, unambiguous code a natural thing rather than requiring special effort.
In Programming Perl (which Larry Wall co-authored), 3rd Edition, page 113, compound statements are defined in terms of expressions and blocks, not statements, and blocks have braces.
Note that unlike in C and Java,
[compound statements] are defined in
terms of BLOCKS, not statements.
This means that the braces are
requried--no dangling statements
allowed.
I don't know if that answers your question but it seems like in this case he chose to favor a simple language structure instead of making exceptions.
Perhaps not directly relevant to your question about (presumably) Perl 5 and earlier, but…
In Perl 6, control structures do not require parentheses:
if $x { say '$x is true' }
for <foo bar baz> -> $s { say "[$s]" }
This would be horrendously ambiguous if the braces were also optional.
Isn't it that Perl allows you to skip the braces, but then you have to write statement before condition? i.e.
#!/usr/bin/perl
my $a = 1;
if ($a == 1) {
print "one\n";
}
# is equivalent to:
print "one\n" if ($a == 1);
"Okay, so normally, you need braces around blocks, but not if the block is only one statement long, except, of course, if your statement would be ambiguous in a way that would be ruled by precedence rules not like you want if you omitted the braces -- in this case, you could also imagine the use of parentheses, but that would be inconsistent, because it is a block after all -- this is of course dependent on the respective precedence of the involved operators. In any case, you don't need to put semicolons after closing braces -- it is even wrong if you end an if statement that is followed by an else statement -- except that you absolutely must put a semicolon at the end of a header file in C++ (or was it C?)."
Seriously, I am glad for every explicitness and uniformity in code.
Just guessing here, but "unblocked" loops/ifs/etc. tend to be places where subtle bugs are introduced during code maintenance, since a sloppy maintainer might try to add another line "inside the loop" without realizing that it's not really inside.
Of course, this is Perl we're talking about, so probably any argument that relies on maintainability is suspect... :)
The perlstyle pod states
No space before the semicolon
and I can see no reason for that. I know that in English there should not be any space before characters made of 2 parts ( like '?',';','!' ), but I don't see why this should be a rule when writing Perl code.
I confess I personally use spaces before semicolons. My reason is that it makes the statement stands up a bit more clearer. I know it's not a very strong reason, but at least it's a reason.
print "Something\n with : some ; chars"; # good
print "Something\n with : some ; chars" ; # bad??
What's the reason for the second being bad?
From the first paragraph of the Description section:
Each programmer will, of course, have his or her own preferences in regards to formatting, but there are some general guidelines that will make your programs easier to read, understand, and maintain.
And from the third paragraph of the Description section:
Regarding aesthetics of code lay out, about the only thing Larry cares strongly about is that the closing curly bracket of a multi-line BLOCK should line up with the keyword that started the construct. Beyond that, he has other preferences that aren't so strong:
It's just a convention among Perl programmers for style. If you don't like it, you can choose to ignore it. I would compare it to Sun's Java Style guidelines or the suggestions for indenting in the K&R C book. Some environments have their own guidelines. These just happen to be the suggestions for Perl.
As Jon Skeet said in a deleted answer to this question:
If you're happy to be inconsistent with what some other people like, then just write in the most readable form for you. If you're likely to be sharing your code with others - and particularly if they'll be contributing code too - then it's worth trying to agree some consistent style.
This is only my opinion, but I also realize that people read code in different ways so "bad' is relative. If you are the only person who ever looks at your code, you can do whatever you like. However, having looked at a lot of Perl code, I've only seen a couple of people put spaces before statement separators.
When you are doing something that is very different from what the rest of the world is doing, the difference stands out to other people because their brain don't see it in the same way it. Conversely, doing things differently makes it harder for you to read other people's code for the same reason: you don't see the visual patterns you expect.
My standard is to avoid visual clutter, and that I should see islands of context. Anything that stands out draws attention, (as you say you want), but I don't need to draw attention to my statement separators because I usually only have one statement per line. Anything that I don't really need to see should fade into the visual background. I don't like semi-colons to stand out. To me, a semicolon is a minor issue, and I want to reduce the number of things my eyes see as distinct groups.
There are times where the punctuation is important, and I want those to stand out, and in that case the semicolon needs to get out of the way. I often do this with the conditional operator, for instance:
my $foo = $boolean ?
$some_long_value
:
$some_other_value
;
If you are a new coder, typing that damned statement separator might be a big pain in your life, but your pains will change over time. Later on, the style fad you choose to mitigate one pain becomes the pain. You'll get used to the syntax eventually. The better question might be, why don't they already stand out? If you're using a good programmer font that has heavier and bigger punctuation faces, you might have an easier time seeing them.
Even if you decide to do that in your code, I find it odd that people do it in their writing. I never really noticed it before Stackoverflow, but a lot of programmers here put spaces before most punctuation.
It's not a rule, it's one of Larry Wall's style preferences. Style preferences are about what help you and the others who will maintain your code visually absorb information quickly and accurately.
I agree with Larry in this case, and find the space before the semicolon ugly and disruptive to my reading process, but others such as yourself may find the exact opposite. I would, of course, prefer that you use the sort of style I like, but there aren't any laws on the books about it.
Yet.
Like others have said, this is a matter of style, not a hard and fast rule. For instance, I don't like four spaces for indentation. I am a real tab for block level indentation/spaces for lining things up sort of programmer, so I ignore that section of perlstyle.
I also require a reason for style. If you cannot clearly state why you prefer a given style then the rule is pointless. In this case the reason is fairly easy to see. Whitespace that is not required is used to call attention to something or make something easier to read. So, does a semicolon deserve extra attention? Every expression (barring control structures) will end with a semicolon, and most expressions fit on one line. So calling attention to the expected case seems to be a waste of a programmers time and attention. This is why most programmers indent a line that is a continuation of an expression to call attention to the fact that it didn't end on one line:
open my $fh, "<", $file
or die "could not open '$file': $!";
Now, the second reason we use whitespace is make something easier to read. Is
foo("bar") ;
easier to read than
foo("bar");
I would make the claim that is harder to read because it is calling my attention to the semicolon, and I, for the most part, don't care about semicolons if the file is formatted correctly. Sure Perl cares, and if I am missing one it will tell me about it.
Feel free to put a space. The important thing is that you be consistent; consistency allows you to more readily spot errors.
There's one interesting coding style that places , and ; at the beginning of the following line (after indentation). While that's not to my taste, it works so long as it is consistent.
Update: an example of that coding style (which I do not advocate):
; sub capture (&;*)
{ my $code = shift
; my $fh = shift || select
; local $output
; no strict 'refs'
; if ( my $to = tied *$fh )
{ my $tc = ref $to
; bless $to, __PACKAGE__
; &{$code}()
; bless $to, $tc
}
else
{ tie *$fh , __PACKAGE__
; &{$code}()
; untie *$fh
}
; \$output
}
A defense can be found here: http://perl.4pro.net/pcs.html.
(2011 Update: that page seems to have gone AWOL; a rescued copy can be seen here: http://ysth.info/pcs.html)
Well, it's style, not a rule. Style rules are by definition fairly arbitrary. As for why you shouldn't put spaces before semicolons, it's simply because that's The Way It's Done. Not just with Perl, but with C and all the other curlies-and-semicolons languages going back to C and newer C-influenced ones like C++, C#, Objective C, Javascript, Java, PHP, etc.
Because people don't expect it . It looks strange when you do it .
The reason I would cite is to be consistent within a project. I have been in a project where the majority of programmers would not insert the space but one programmer does. If he works on a defect he may routinely add the space in the lines of code he is examining as that is what he likes and there is nothing in the style guide to say otherwise.
The visual diff tools in use can't determine this so show a mass of line changes when only one may have changed and becomes difficult to review. (OK this could be an argument for a better diff tools but embedded work tends to restrict tool choice more).
Maybe a suitable guide for this would be to choose whichever format you want for you semicolon but don't change others unless you modify the statement itself.
I really don't like it. But it's 100% a personal decision and group convention you must make.
Code style is just a set of rules to make reading and maintaining code easier.
There is no real bad style, but some are more accepted than others. And they are of course the source for some "religious battles" (to call curly braces style ;-) ).
To make a real life comparison, we have trafic lights with red, yellow/orange and green. Despite the psychological effects of the colors, it is not wrong to use Purple, Brown and Pink, but just because we are all used to the colors there are less trafic accidents.