Nested dereferencing arrows in Perl: to omit or not to omit? - perl

In Perl, when you have a nested data structure, it is permissible to omit de-referencing arrows to 2d and more level of nesting. In other words, the following two syntaxes are identical:
my $hash_ref = { 1 => [ 11, 12, 13 ], 3 => [31, 32] };
my $elem1 = $hash_ref->{1}->[1];
my $elem2 = $hash_ref->{1}[1]; # exactly the same as above
Now, my question is, is there a good reason to choose one style over the other?
It seems to be a popular bone of stylistic contention (Just on SO, I accidentally bumped into this and this in the space of 5 minutes).
So far, almost none of the usual suspects says anything definitive:
perldoc merely says "you are free to omit the pointer dereferencing arrow".
Conway's "Perl Best Practices" says "whenever possible, dereference with arrows", but it appears to only apply to the context of dereferencing the main reference, not optional arrows on 2d level of nested data structures.
"Mastering Perl for Bioinfirmatics" author James Tisdall doesn't give very solid preference either:
"The sharp-witted reader may have
noticed that we seem to be omitting
arrow operators between array
subscripts. (After all, these are
anonymous arrays of anonymous arrays
of anonymous arrays, etc., so
shouldn't they be written
[$array->[$i]->[$j]->[$k]?) Perl
allows this; only the arrow operator
between the variable name and the
first array subscript is required. It
make things easier on the eyes and
helps avoid carpal tunnel syndrome. On
the other hand, you may prefer to keep
the dereferencing arrows in place, to
make it clear you are dealing with
references. Your choice."
UPDATED "Intermediate Perl", as per its co-author brian d foy, recommends omitting the arrows. See brian's full answer below.
Personally, I'm on the side of "always put arrows in, since it's more readable and obvious they're dealing with a reference".
UPDATE To be more specific re: readability, in case of a multi-nested expression where subscripts themselves are expressions, the arrows help to "visually tokenize" the expressions by more obviously separating subscripts from one another.

Unless you really enjoy typing or excessively long lines, don't use the arrows when you don't need them. Subscripts next to subscripts imply references, so the competent programmer doesn't need extra clues to figure that out.
I disagree that it's more readable to have extra arrows. It's definitely unconventional to have them moving the interesting parts of the term further away from each other.
In Intermediate Perl, where we actually teach references, we tell you to omit the unnecessary arrows.
Also, remember there is no such thing as "readability". There is only what you (and others) have trained your eyes to recognize as patterns. You don't read things character-by-character then figure out what they mean. You see groups of things that you've seen before and recognize them. At the base syntax level that you are talking about, your "readability" is just your ability to recognize patterns. It's easier to recognize patterns the more you use it, so it's not surprising that what you do now is more "readable" to you. New styles seem odd at first, but eventually become more recognizable, and thus more "readable".
The example you give in your comments isn't hard to read because it lacks arrows. It's still hard to read with arrows:
$expr1->[$sub1{$x}]{$sub2[$y]-33*$x3}{24456+myFunct($abc)}
$expr1->[$sub1{$x}]->{$sub2[$y]-33*$x3}->{24456+myFunct($abc)}
I write that sort of code like this, using these sorts of variable names to remind the next coder about the sort of container each level is:
my $index = $sub1{$x};
my $key1 = $sub2[$y]-33*$x3;
my $key2 = 24456+myFunct($abc);
$expr1->[ $index ]{ $key1 }{ $key2 };
To make that even better, hide the details in a subroutine (that's what they are there for :) so you never have to play with that mess of a data structure directly. This is more readable that any of them:
my $value = get_value( $index, $key1, $key2 );
my $value = get_value(
$sub1{$x},
$sub2[$y]-33*$x3,
24456+myFunct($abc)
);

Since the -> arrow is non-optionally used for method calls, I prefer to only use it to call code. So I would use the following:
$object->method;
$coderef->();
$$dispatch{name}->();
$$arrayref[1];
$$arrayref[1][5];
#$arrayref[1 .. 5];
#$arrayref;
$$hashref{foo};
$$hashref{foo}{bar};
#$hashref{qw/foo bar/};
%$hashref;
Two sigils back to back always means a dereference, and the structure remains consistent across all forms of dereferencing (scalar, slice, all).
It also keeps all parts of the variable "together" which I find more readable, and it's shorter :)

I have always written all of the arrows. I agree with you, they separate better the different subscripts. Plus I use curly braces for regular expressions, so to me {foo}{bar} is a substitution: s{foo}{bar} stands out more from $s->{foo}->{bar} than from $s->{foo}{bar}.
I don't think it's a big thing though, reading code that omits the extra arrows is not a problem (as opposed to any indentation that's not the one I use ;--)

Related

ISO Emacs [C]Perl-mode colorize hash references like hashes

I still use Perl, some new code, and maintaining old code. I use emacs and cperl-mode. I like syntax coloring.
At first (many years ago) I disliked cperl-mode's special coloring of arrays and hashes, but it has grown on me. To the point where I will sometimes prefer to use a hash rather than a hash reference, just to get the special coloring. That may not sound so bad - but if I admit to occasionally using a global %hash or $hash{key} rather than an object member $hashref->{key}, just to get the coloring, well, it is bad. I.e. syntax coloring is making me want to follow bad programming practices.
So, my question is: does anyone have emacs/elisp configuration code to get cperl-mode or perl-mode to colorize a hash reference like $hashref->{key} in the same or similar to $hash{key}?
Let me use bold to indicate the places that might be colored:
cperl-mode does now: $hash{key}
what I would like: $hash->{key}
I have done extensive customization of coloring (faces) in emacs - e.g. colorizing to distinguish DEBUG code from non-debug code, TEST from non-test, etc. - but I have not managed to get this syntax coloring in cperl-mode working. (FOLLOW-ON: I eventually got font-lock-add-keywords working, as shown in my answer to my own question below.)
In the example below, you can see that $hashref->{key} is not colored, while $hash{key} is.
Similarly for array refs, and perhaps other refs.
I realize that coloring refs will only apply to derefs like $hashref->{key}, and not to other stuff like $hashref1 = $hashref2. I think that I can live with that.
You can set cperl-highlight-variables-indiscriminately to t (via customizing it) to get scalar variables coloured not only when declared but always.
Using the same colour for #ref and $ref is confusing, as they are different variable types (and different variables); similarly, it's confusing to use the scalar colour for $ref but array colour for $ref->[0] as they are the same variable.
Also, Perl being Perl, would you use all three colours here?
if (ref $ref eq 'ARRAY') {
return $ref->[0]
} elsif (ref $ref eq 'HASH') {
return $ref->{key}
}
I dislike answering my own question, but the wild goose chase answer suggested annoyed me enough to figure out what my attempts were doing wrong.
(I hate it when I ask for X, somebody answers Y, and disses X. Especially when X is doable, as here.)
Here is working code from my .emacs:
(defun ag-extend-cperl-font-lock-keywords ()
(interactive)
(font-lock-add-keywords
'cperl-mode
'(
(
"\\($[a-zA-Z_][a-zA-Z_0-9]*->\\){"
1 'cperl-hash-face t
)
(
"\\($[a-zA-Z_][a-zA-Z_0-9]*->\\)\\["
1 'cperl-array-face t
)
(
"\\($[a-zA-Z_][a-zA-Z_0-9]*->\\)("
1 'font-lock-function-name-face t
)
)
t
)
)
(ag-extend-cperl-font-lock-keywords)
giving
Just for grins, #choroba's example of multiple types:
I haven't decided if I should create separate faces for hashrefs, arrayrefs, and coderefs. For now, just using the same face as their non-ref counterparts. Including -> as part of the text colored provides some distinction between non-ref and ref.
Nor have I yet decided if I want to extend to the various other Perl syntaxes.
From https://perldoc.perl.org/perlref.html:
But now that I have the font-lock-add-keywords invocation, those details I can fix at my leisure.
You can't do what you want without extending cperl-mode. cperl-mode doesn't understand references. There's no reference "face" for you to customize, and no "thing" to apply that face to. If you want to render a hash reference like a hash (a la your example) I'd start with modifying the second regex in the definition of t-font-lock-keywords-1 in cperl-mode.el. That should take care of hash and array refs. Beware of cperl-highlight-variables-indiscriminately overriding your changes. If you want to do something fancier, like have a "reference face", you'll have to
define a face
add the face to customize (if you want)
hack t-font-lock-keywords-1 and apply the face to a regex match
Of course it might just be easier to send a feature request upstream. cperl-mode is ancient and could definitely use some modernization.

Why doesn't map read from #ARGV/#_?

Is there a good reason for map to not read from #_ (in functions) or #ARGV (anywhere else) when not given an argument list?
I can't say why Larry didn't make map, grep and the other list functions operate on #_ like pop and shift do, but I can tell you why I wouldn't. Default variables used to be in vogue, but Perl programmers have discovered that most of the "default" behaviors cause more problems than they solve. I doubt they would make it into the language today.
The first problem is remembering what a function does when passed no arguments. Does it act on a hidden variable? Which one? You just have to know by rote, and that makes it a lot more work to learn, read and write the language. You're probably going to get it wrong and that means bugs. This could be mitigated by Perl being consistent about it (ie. ALL functions which take lists operate on #_ and ALL functions which take scalars operate on $_) but there's more problems.
The second problem is the behavior changes based on context. Take some code outside of a subroutine, or put it into a subroutine, and suddenly it works differently. That makes refactoring harder. If you made it work on just #_ or just #ARGV then this problem goes away.
Third is default variables have a tendency to be quietly modified as well as read. $_ is dangerous for this reason, you never know when something is going to overwrite it. If the use of #_ as the default list variable were adopted, this behavior would likely leak in.
Fourth, it would probably lead to complicated syntax problems. I'd imagine this was one of the original reasons keeping it from being added to the language, back when $_ was in vogue.
Fifth, #ARGV as a default makes some sense when you're writing scripts that primarily work with #ARGV... but it doesn't make any sense when working on a library. Perl programmers have shifted from writing quick scripts to writing libraries.
Sixth, using $_ as default is a way of chaining together scalar operations without having to write the variable over and over again. This might have been mitigated if Perl was more consistent about its return values, and if regexes didn't have special syntax, but there you have it. Lists can already be chained, map { ... } sort { ... } grep /.../, #foo, so that use case is handled by a more efficient mechanism.
Finally, it's of very limited use. It's very rare that you want to pass #_ to map and grep. The problems with hidden defaults are far greater than avoiding typing two characters. This space savings might have slightly more sense when Perl was primarily for quick and dirty work, but it makes no sense when writing anything beyond a few pages of code.
PS shift defaulting to #_ has found a niche in my $self = shift, but I find this only shines because Perl's argument handling is so poor.
The map function takes in a list, not an array. shift takes an array. With lists, on the other hand, #_/#ARGV may or may not be fair defaults.

Is this trivial function silly?

I came across a function today that made me stop and think. I can't think of a good reason to do it:
sub replace_string {
my $string = shift;
my $regex = shift;
my $replace = shift;
$string =~ s/$regex/$replace/gi;
return $string;
}
The only possible value I can see to this is that it gives you the ability to control the default options used with a substitution, but I don't consider that useful. My first reaction upon seeing this function get called is "what does this do?". Once I learn what it does, I am going to assume it does that from that point on. Which means if it changes, it will break any of my code that needs it to do that. This means the function will likely never change, or changing it will break lots of code.
Right now I want to track down the original programmer and beat some sense into him or her. Is this a valid desire, or am I missing some value this function brings to the table?
The problems with that function include:
Opaque: replace_string doesn't tell you that you're doing a case-insensitive, global replace without escaping.
Non-idiomatic: $string =~ s{$this}{$that}gi is something you can learn what it means once, and its not like its some weird corner feature. replace_string everyone has to learn the details of, and its going to be different for everyone who writes it.
Inflexible: Want a non-global search-and-replace? Sorry. You can put in some modifiers by passing in a qr// but that's far more advanced knowledge than the s/// its hiding.
Insecure: A user might think that the function takes a string, not a regex. If they put in unchecked user input they are opening up a potential security hole.
Slower: Just to add the final insult.
The advantages are:
Literate: The function name explains what it does without having to examine the details of the regular expression (but it gives an incomplete explanation).
Defaults: The g and i defaults are always there (but that's non-obvious from the name).
Simpler Syntax: Don't have to worry about the delimiters (not that s{}{} is difficult).
Protection From Global Side Effects: Regex matches set a salad of global variables ($1, $+, etc...) but they're automatically locally scoped to the function. They won't interfere if you're making use of them for another regex.
A little overzealous with the encapsulation.
print replace_string("some/path", "/", ":");
Yes, you get some magic in not having to replace / with a different delimiter or escape / in the regex.
If it's just a verbose replacement for s/// then I'd guess that it was written by someone who came to Perl from a language where using regular expressions required extra syntax and who is/was more comfortable coding that way. If that's the case I'd classify it as Perl baby-talk: silly and awkward to seasoned coders but not bad -- not bad enough to warrant a beating, anyway. ;)
If I squint really hard I can almost see cases where such a function might be useful: applying a bunch of patterns to a bunch of strings, allowing user input for the terms, supplying a CODE reference for a callback...
My first reaction upon seeing that is a new Perl programmer didn't want to remember the syntax for a regular expression and created a function he or she could easily remember, without learning the syntax.
The only reason I can see other than the ones mentioned already ( new programmer does not want to remember regex syntax ) is that it is possible they may be using some IDE that does not have any syntax highlighting for regex, but it does exist for functions they've written. Not the best of reasons, but plausible.

What is Perl's secret of getting small code do so much?

I've seen many (code-golf) Perl programs out there and even if I can't read them (Don't know Perl) I wonder how you can manage to get such a small bit of code to do what would take 20 lines in some other programming language.
What is the secret of Perl? Is there a special syntax that allows you to do complex tasks in few keystrokes? Is it the mix of regular expressions?
I'd like to learn how to write powerful and yet short programs like the ones you know from the code-golf challenges here. What would be the best place to start out? I don't want to learn "clean" Perl - I want to write scripts even I don't understand anymore after a week.
If there are other programming languages out there with which I can write even shorter code, please tell me.
There are a number of factors that make Perl good for code golfing:
No data typing. Values can be used interchangeably as strings and numbers.
"Diagonal" syntax. Usually referred to as TMTOWTDI (There's more than one way to do it.)
Default variables. Most functions act on $_ if no argument is specified. (A few act
on #_.)
Functions that take multiple arguments (like split) often have defaults that
let you omit some arguments or even all of them.
The "magic" readline operator, <>.
Higher order functions like map and grep
Regular expressions are integrated into the syntax (i.e. not a separate library)
Short-circuiting operators return the last value tested.
Short-circuiting operators can be used for flow control.
Additionally, without strictures (which are off be default):
You don't need to declare variables.
Barewords auto-quote to strings.
undef becomes either 0 or '' depending on context.
Now that that's out of the way, let me be very clear on one point:
Golf is a game.
It's great to aspire to the level of perl-fu that allows you to be good at it, but in the name of $DIETY do not golf real code. For one, it's a horrible waste of time. You could spend an hour trying to trim out a few characters. Golfed code is fragile: it almost always makes major assumptions and blithely ignores error checking. Real code can't afford to be so careless. Finally, your goal as a programmer should be to write clear, robust, and maintainable code. There's a saying in programming: Always write your code as if the person who will maintain it is a violent sociopath who knows where you live.
So, by all means, start golfing; but realize that it's just playing around and treat it as such.
Most people miss the point of much of Perl's syntax and default operators. Perl is largely a "DWIM" (do what I mean) language. One of it's major design goals is to "make the common things easy and the hard things possible".
As part of that, Perl designers talk about Huffman coding of the syntax and think about what people need to do instead of just giving them low-level primitives. The things that you do often should take the least amount of typing, and functions should act like the most common behavior. This saves quite a bit of work.
For instance, the split has many defaults because there are some use cases where leaving things off uses the common case. With no arguments, split breaks up $_ on whitespace because that's a very common use.
my #bits = split;
A bit less common but still frequent case is to break up $_ on something else, so there's a slightly longer version of that:
my #bits = split /:/;
And, if you wanted to be explicit about the data source, you can specify the variable too:
my #bits = split /:/, $line;
Think of this as you would normally deal with life. If you have a common task that you perform frequently, like talking to your bartender, you have a shorthand for it the covers the usual case:
The usual
If you need to do something, slightly different, you expand that a little:
The usual, but with onions
But you can always note the specifics
A dirty Bombay Sapphire martini shaken not stirred
Think about this the next time you go through a website. How many clicks does it take for you to do the common operations? Why are some websites easy to use and others not? Most of the time, the good websites require you to do the least amount of work to do the common things. Unlike my bank which requires no fewer than 13 clicks to make a credit card bill payment. It should be really easy to give them money. :)
This doesn't answer the whole question, but in regards to writing code you won't be able to read in a couple days, here's a few languages that will encourage you to write short, virtually unreadable code:
J
K
APL
Golfscript
Perl has a lot of single character special variables that provide a lot of shortcuts eg $. $_ $# $/ $1 etc. I think it's that combined with the built in regular expressions, allows you to write some very concise but unreadable code.
Perl's special variables ($_, $., $/, etc.) can often be used to make code shorter (and more obfuscated).
I'd guess that the "secret" is in providing native operations for often repeated tasks.
In the domain that perl was originally envisioned for you often have to
Take input linewise
Strip off whitespace
Rip lines into words
Associate pairs of data
...
and perl simple provided operators to do these things. The short variable names and use of defaults for many things is just gravy.
Nor was perl the first language to go this way. Many of the features of perl were stolen more-or-less intact (or often slightly improved) from sed and awk and various shells. Good for Larry.
Certainly perl wasn't the last to go this way, you'll find similar features in python and php and ruby and ... People liked the results and weren't about to give them up just to get more regular syntax.
What's Java's secret of copying a variable in only one line, without worrying about buses and memory? Answer: the code is transformed to bigger code. Same for every language ever invented.

Why is it bad to put a space before a semicolon?

The perlstyle pod states
No space before the semicolon
and I can see no reason for that. I know that in English there should not be any space before characters made of 2 parts ( like '?',';','!' ), but I don't see why this should be a rule when writing Perl code.
I confess I personally use spaces before semicolons. My reason is that it makes the statement stands up a bit more clearer. I know it's not a very strong reason, but at least it's a reason.
print "Something\n with : some ; chars"; # good
print "Something\n with : some ; chars" ; # bad??
What's the reason for the second being bad?
From the first paragraph of the Description section:
Each programmer will, of course, have his or her own preferences in regards to formatting, but there are some general guidelines that will make your programs easier to read, understand, and maintain.
And from the third paragraph of the Description section:
Regarding aesthetics of code lay out, about the only thing Larry cares strongly about is that the closing curly bracket of a multi-line BLOCK should line up with the keyword that started the construct. Beyond that, he has other preferences that aren't so strong:
It's just a convention among Perl programmers for style. If you don't like it, you can choose to ignore it. I would compare it to Sun's Java Style guidelines or the suggestions for indenting in the K&R C book. Some environments have their own guidelines. These just happen to be the suggestions for Perl.
As Jon Skeet said in a deleted answer to this question:
If you're happy to be inconsistent with what some other people like, then just write in the most readable form for you. If you're likely to be sharing your code with others - and particularly if they'll be contributing code too - then it's worth trying to agree some consistent style.
This is only my opinion, but I also realize that people read code in different ways so "bad' is relative. If you are the only person who ever looks at your code, you can do whatever you like. However, having looked at a lot of Perl code, I've only seen a couple of people put spaces before statement separators.
When you are doing something that is very different from what the rest of the world is doing, the difference stands out to other people because their brain don't see it in the same way it. Conversely, doing things differently makes it harder for you to read other people's code for the same reason: you don't see the visual patterns you expect.
My standard is to avoid visual clutter, and that I should see islands of context. Anything that stands out draws attention, (as you say you want), but I don't need to draw attention to my statement separators because I usually only have one statement per line. Anything that I don't really need to see should fade into the visual background. I don't like semi-colons to stand out. To me, a semicolon is a minor issue, and I want to reduce the number of things my eyes see as distinct groups.
There are times where the punctuation is important, and I want those to stand out, and in that case the semicolon needs to get out of the way. I often do this with the conditional operator, for instance:
my $foo = $boolean ?
$some_long_value
:
$some_other_value
;
If you are a new coder, typing that damned statement separator might be a big pain in your life, but your pains will change over time. Later on, the style fad you choose to mitigate one pain becomes the pain. You'll get used to the syntax eventually. The better question might be, why don't they already stand out? If you're using a good programmer font that has heavier and bigger punctuation faces, you might have an easier time seeing them.
Even if you decide to do that in your code, I find it odd that people do it in their writing. I never really noticed it before Stackoverflow, but a lot of programmers here put spaces before most punctuation.
It's not a rule, it's one of Larry Wall's style preferences. Style preferences are about what help you and the others who will maintain your code visually absorb information quickly and accurately.
I agree with Larry in this case, and find the space before the semicolon ugly and disruptive to my reading process, but others such as yourself may find the exact opposite. I would, of course, prefer that you use the sort of style I like, but there aren't any laws on the books about it.
Yet.
Like others have said, this is a matter of style, not a hard and fast rule. For instance, I don't like four spaces for indentation. I am a real tab for block level indentation/spaces for lining things up sort of programmer, so I ignore that section of perlstyle.
I also require a reason for style. If you cannot clearly state why you prefer a given style then the rule is pointless. In this case the reason is fairly easy to see. Whitespace that is not required is used to call attention to something or make something easier to read. So, does a semicolon deserve extra attention? Every expression (barring control structures) will end with a semicolon, and most expressions fit on one line. So calling attention to the expected case seems to be a waste of a programmers time and attention. This is why most programmers indent a line that is a continuation of an expression to call attention to the fact that it didn't end on one line:
open my $fh, "<", $file
or die "could not open '$file': $!";
Now, the second reason we use whitespace is make something easier to read. Is
foo("bar") ;
easier to read than
foo("bar");
I would make the claim that is harder to read because it is calling my attention to the semicolon, and I, for the most part, don't care about semicolons if the file is formatted correctly. Sure Perl cares, and if I am missing one it will tell me about it.
Feel free to put a space. The important thing is that you be consistent; consistency allows you to more readily spot errors.
There's one interesting coding style that places , and ; at the beginning of the following line (after indentation). While that's not to my taste, it works so long as it is consistent.
Update: an example of that coding style (which I do not advocate):
; sub capture (&;*)
{ my $code = shift
; my $fh = shift || select
; local $output
; no strict 'refs'
; if ( my $to = tied *$fh )
{ my $tc = ref $to
; bless $to, __PACKAGE__
; &{$code}()
; bless $to, $tc
}
else
{ tie *$fh , __PACKAGE__
; &{$code}()
; untie *$fh
}
; \$output
}
A defense can be found here: http://perl.4pro.net/pcs.html.
(2011 Update: that page seems to have gone AWOL; a rescued copy can be seen here: http://ysth.info/pcs.html)
Well, it's style, not a rule. Style rules are by definition fairly arbitrary. As for why you shouldn't put spaces before semicolons, it's simply because that's The Way It's Done. Not just with Perl, but with C and all the other curlies-and-semicolons languages going back to C and newer C-influenced ones like C++, C#, Objective C, Javascript, Java, PHP, etc.
Because people don't expect it . It looks strange when you do it .
The reason I would cite is to be consistent within a project. I have been in a project where the majority of programmers would not insert the space but one programmer does. If he works on a defect he may routinely add the space in the lines of code he is examining as that is what he likes and there is nothing in the style guide to say otherwise.
The visual diff tools in use can't determine this so show a mass of line changes when only one may have changed and becomes difficult to review. (OK this could be an argument for a better diff tools but embedded work tends to restrict tool choice more).
Maybe a suitable guide for this would be to choose whichever format you want for you semicolon but don't change others unless you modify the statement itself.
I really don't like it. But it's 100% a personal decision and group convention you must make.
Code style is just a set of rules to make reading and maintaining code easier.
There is no real bad style, but some are more accepted than others. And they are of course the source for some "religious battles" (to call curly braces style ;-) ).
To make a real life comparison, we have trafic lights with red, yellow/orange and green. Despite the psychological effects of the colors, it is not wrong to use Purple, Brown and Pink, but just because we are all used to the colors there are less trafic accidents.