Is there a good reason for map to not read from #_ (in functions) or #ARGV (anywhere else) when not given an argument list?
I can't say why Larry didn't make map, grep and the other list functions operate on #_ like pop and shift do, but I can tell you why I wouldn't. Default variables used to be in vogue, but Perl programmers have discovered that most of the "default" behaviors cause more problems than they solve. I doubt they would make it into the language today.
The first problem is remembering what a function does when passed no arguments. Does it act on a hidden variable? Which one? You just have to know by rote, and that makes it a lot more work to learn, read and write the language. You're probably going to get it wrong and that means bugs. This could be mitigated by Perl being consistent about it (ie. ALL functions which take lists operate on #_ and ALL functions which take scalars operate on $_) but there's more problems.
The second problem is the behavior changes based on context. Take some code outside of a subroutine, or put it into a subroutine, and suddenly it works differently. That makes refactoring harder. If you made it work on just #_ or just #ARGV then this problem goes away.
Third is default variables have a tendency to be quietly modified as well as read. $_ is dangerous for this reason, you never know when something is going to overwrite it. If the use of #_ as the default list variable were adopted, this behavior would likely leak in.
Fourth, it would probably lead to complicated syntax problems. I'd imagine this was one of the original reasons keeping it from being added to the language, back when $_ was in vogue.
Fifth, #ARGV as a default makes some sense when you're writing scripts that primarily work with #ARGV... but it doesn't make any sense when working on a library. Perl programmers have shifted from writing quick scripts to writing libraries.
Sixth, using $_ as default is a way of chaining together scalar operations without having to write the variable over and over again. This might have been mitigated if Perl was more consistent about its return values, and if regexes didn't have special syntax, but there you have it. Lists can already be chained, map { ... } sort { ... } grep /.../, #foo, so that use case is handled by a more efficient mechanism.
Finally, it's of very limited use. It's very rare that you want to pass #_ to map and grep. The problems with hidden defaults are far greater than avoiding typing two characters. This space savings might have slightly more sense when Perl was primarily for quick and dirty work, but it makes no sense when writing anything beyond a few pages of code.
PS shift defaulting to #_ has found a niche in my $self = shift, but I find this only shines because Perl's argument handling is so poor.
The map function takes in a list, not an array. shift takes an array. With lists, on the other hand, #_/#ARGV may or may not be fair defaults.
Related
Is there a reason why CGI.pm examples rely on list concatenation rather than on string concatenation? are the two interchangeable? think
print q->hidden(-name =>'rm', -value => $var).
q->submit(-name =>"rm$var");
vs.
print q->hidden(-name =>'rm', -value => $var),
q->submit(-name =>"rm$var");
I have a specific reason for asking. it is very convenient to build up a page from strings. after all, perl understands scalar strings as basic types.
however, I am completely perplexed by some odd behavior in the string concat context. Specifically, I have encountered occasional cases in which the $var in the hidden is not the same as the one in the submit button. I could work around this, but I would rather understand CGI.pm .
could someone please explain whether string concat should work?
Unless you change the default value of $,,
print EXPR1, EXPR2;
and
print EXPR1 . EXPR2;
produce the same result if the expressions aren't context-specific. The functions in question always return an HTML string, so you're good.
You're right that the two examples have the same effect (well, as ikegami says, unless you have changed $,). The only difference is, of course, that in the first example, print is passed one string and in the second example, it gets two.
But have you read the comments about the HTML generation functions in recent versions of the CGI.pm documentation?
All HTML generation functions within CGI.pm are no longer being
maintained. Any issues, bugs, or patches will be rejected unless they
relate to fundamentally broken page rendering.
The rationale for this is that the HTML generation functions of CGI.pm
are an obfuscation at best and a maintenance nightmare at worst. You
should be using a template engine for better separation of concerns.
See CGI::Alternatives for an example of using CGI.pm with the
Template::Toolkit module.
These functions, and perldoc for them, will continue to exist in the
v4 releases of CGI.pm but may be deprecated (soft) in v5 and beyond.
I would seriously consider moving way from these functions (and, indeed, CGI.pm itself) for new work.
I came across a function today that made me stop and think. I can't think of a good reason to do it:
sub replace_string {
my $string = shift;
my $regex = shift;
my $replace = shift;
$string =~ s/$regex/$replace/gi;
return $string;
}
The only possible value I can see to this is that it gives you the ability to control the default options used with a substitution, but I don't consider that useful. My first reaction upon seeing this function get called is "what does this do?". Once I learn what it does, I am going to assume it does that from that point on. Which means if it changes, it will break any of my code that needs it to do that. This means the function will likely never change, or changing it will break lots of code.
Right now I want to track down the original programmer and beat some sense into him or her. Is this a valid desire, or am I missing some value this function brings to the table?
The problems with that function include:
Opaque: replace_string doesn't tell you that you're doing a case-insensitive, global replace without escaping.
Non-idiomatic: $string =~ s{$this}{$that}gi is something you can learn what it means once, and its not like its some weird corner feature. replace_string everyone has to learn the details of, and its going to be different for everyone who writes it.
Inflexible: Want a non-global search-and-replace? Sorry. You can put in some modifiers by passing in a qr// but that's far more advanced knowledge than the s/// its hiding.
Insecure: A user might think that the function takes a string, not a regex. If they put in unchecked user input they are opening up a potential security hole.
Slower: Just to add the final insult.
The advantages are:
Literate: The function name explains what it does without having to examine the details of the regular expression (but it gives an incomplete explanation).
Defaults: The g and i defaults are always there (but that's non-obvious from the name).
Simpler Syntax: Don't have to worry about the delimiters (not that s{}{} is difficult).
Protection From Global Side Effects: Regex matches set a salad of global variables ($1, $+, etc...) but they're automatically locally scoped to the function. They won't interfere if you're making use of them for another regex.
A little overzealous with the encapsulation.
print replace_string("some/path", "/", ":");
Yes, you get some magic in not having to replace / with a different delimiter or escape / in the regex.
If it's just a verbose replacement for s/// then I'd guess that it was written by someone who came to Perl from a language where using regular expressions required extra syntax and who is/was more comfortable coding that way. If that's the case I'd classify it as Perl baby-talk: silly and awkward to seasoned coders but not bad -- not bad enough to warrant a beating, anyway. ;)
If I squint really hard I can almost see cases where such a function might be useful: applying a bunch of patterns to a bunch of strings, allowing user input for the terms, supplying a CODE reference for a callback...
My first reaction upon seeing that is a new Perl programmer didn't want to remember the syntax for a regular expression and created a function he or she could easily remember, without learning the syntax.
The only reason I can see other than the ones mentioned already ( new programmer does not want to remember regex syntax ) is that it is possible they may be using some IDE that does not have any syntax highlighting for regex, but it does exist for functions they've written. Not the best of reasons, but plausible.
I've seen many (code-golf) Perl programs out there and even if I can't read them (Don't know Perl) I wonder how you can manage to get such a small bit of code to do what would take 20 lines in some other programming language.
What is the secret of Perl? Is there a special syntax that allows you to do complex tasks in few keystrokes? Is it the mix of regular expressions?
I'd like to learn how to write powerful and yet short programs like the ones you know from the code-golf challenges here. What would be the best place to start out? I don't want to learn "clean" Perl - I want to write scripts even I don't understand anymore after a week.
If there are other programming languages out there with which I can write even shorter code, please tell me.
There are a number of factors that make Perl good for code golfing:
No data typing. Values can be used interchangeably as strings and numbers.
"Diagonal" syntax. Usually referred to as TMTOWTDI (There's more than one way to do it.)
Default variables. Most functions act on $_ if no argument is specified. (A few act
on #_.)
Functions that take multiple arguments (like split) often have defaults that
let you omit some arguments or even all of them.
The "magic" readline operator, <>.
Higher order functions like map and grep
Regular expressions are integrated into the syntax (i.e. not a separate library)
Short-circuiting operators return the last value tested.
Short-circuiting operators can be used for flow control.
Additionally, without strictures (which are off be default):
You don't need to declare variables.
Barewords auto-quote to strings.
undef becomes either 0 or '' depending on context.
Now that that's out of the way, let me be very clear on one point:
Golf is a game.
It's great to aspire to the level of perl-fu that allows you to be good at it, but in the name of $DIETY do not golf real code. For one, it's a horrible waste of time. You could spend an hour trying to trim out a few characters. Golfed code is fragile: it almost always makes major assumptions and blithely ignores error checking. Real code can't afford to be so careless. Finally, your goal as a programmer should be to write clear, robust, and maintainable code. There's a saying in programming: Always write your code as if the person who will maintain it is a violent sociopath who knows where you live.
So, by all means, start golfing; but realize that it's just playing around and treat it as such.
Most people miss the point of much of Perl's syntax and default operators. Perl is largely a "DWIM" (do what I mean) language. One of it's major design goals is to "make the common things easy and the hard things possible".
As part of that, Perl designers talk about Huffman coding of the syntax and think about what people need to do instead of just giving them low-level primitives. The things that you do often should take the least amount of typing, and functions should act like the most common behavior. This saves quite a bit of work.
For instance, the split has many defaults because there are some use cases where leaving things off uses the common case. With no arguments, split breaks up $_ on whitespace because that's a very common use.
my #bits = split;
A bit less common but still frequent case is to break up $_ on something else, so there's a slightly longer version of that:
my #bits = split /:/;
And, if you wanted to be explicit about the data source, you can specify the variable too:
my #bits = split /:/, $line;
Think of this as you would normally deal with life. If you have a common task that you perform frequently, like talking to your bartender, you have a shorthand for it the covers the usual case:
The usual
If you need to do something, slightly different, you expand that a little:
The usual, but with onions
But you can always note the specifics
A dirty Bombay Sapphire martini shaken not stirred
Think about this the next time you go through a website. How many clicks does it take for you to do the common operations? Why are some websites easy to use and others not? Most of the time, the good websites require you to do the least amount of work to do the common things. Unlike my bank which requires no fewer than 13 clicks to make a credit card bill payment. It should be really easy to give them money. :)
This doesn't answer the whole question, but in regards to writing code you won't be able to read in a couple days, here's a few languages that will encourage you to write short, virtually unreadable code:
J
K
APL
Golfscript
Perl has a lot of single character special variables that provide a lot of shortcuts eg $. $_ $# $/ $1 etc. I think it's that combined with the built in regular expressions, allows you to write some very concise but unreadable code.
Perl's special variables ($_, $., $/, etc.) can often be used to make code shorter (and more obfuscated).
I'd guess that the "secret" is in providing native operations for often repeated tasks.
In the domain that perl was originally envisioned for you often have to
Take input linewise
Strip off whitespace
Rip lines into words
Associate pairs of data
...
and perl simple provided operators to do these things. The short variable names and use of defaults for many things is just gravy.
Nor was perl the first language to go this way. Many of the features of perl were stolen more-or-less intact (or often slightly improved) from sed and awk and various shells. Good for Larry.
Certainly perl wasn't the last to go this way, you'll find similar features in python and php and ruby and ... People liked the results and weren't about to give them up just to get more regular syntax.
What's Java's secret of copying a variable in only one line, without worrying about buses and memory? Answer: the code is transformed to bigger code. Same for every language ever invented.
In Perl, when you have a nested data structure, it is permissible to omit de-referencing arrows to 2d and more level of nesting. In other words, the following two syntaxes are identical:
my $hash_ref = { 1 => [ 11, 12, 13 ], 3 => [31, 32] };
my $elem1 = $hash_ref->{1}->[1];
my $elem2 = $hash_ref->{1}[1]; # exactly the same as above
Now, my question is, is there a good reason to choose one style over the other?
It seems to be a popular bone of stylistic contention (Just on SO, I accidentally bumped into this and this in the space of 5 minutes).
So far, almost none of the usual suspects says anything definitive:
perldoc merely says "you are free to omit the pointer dereferencing arrow".
Conway's "Perl Best Practices" says "whenever possible, dereference with arrows", but it appears to only apply to the context of dereferencing the main reference, not optional arrows on 2d level of nested data structures.
"Mastering Perl for Bioinfirmatics" author James Tisdall doesn't give very solid preference either:
"The sharp-witted reader may have
noticed that we seem to be omitting
arrow operators between array
subscripts. (After all, these are
anonymous arrays of anonymous arrays
of anonymous arrays, etc., so
shouldn't they be written
[$array->[$i]->[$j]->[$k]?) Perl
allows this; only the arrow operator
between the variable name and the
first array subscript is required. It
make things easier on the eyes and
helps avoid carpal tunnel syndrome. On
the other hand, you may prefer to keep
the dereferencing arrows in place, to
make it clear you are dealing with
references. Your choice."
UPDATED "Intermediate Perl", as per its co-author brian d foy, recommends omitting the arrows. See brian's full answer below.
Personally, I'm on the side of "always put arrows in, since it's more readable and obvious they're dealing with a reference".
UPDATE To be more specific re: readability, in case of a multi-nested expression where subscripts themselves are expressions, the arrows help to "visually tokenize" the expressions by more obviously separating subscripts from one another.
Unless you really enjoy typing or excessively long lines, don't use the arrows when you don't need them. Subscripts next to subscripts imply references, so the competent programmer doesn't need extra clues to figure that out.
I disagree that it's more readable to have extra arrows. It's definitely unconventional to have them moving the interesting parts of the term further away from each other.
In Intermediate Perl, where we actually teach references, we tell you to omit the unnecessary arrows.
Also, remember there is no such thing as "readability". There is only what you (and others) have trained your eyes to recognize as patterns. You don't read things character-by-character then figure out what they mean. You see groups of things that you've seen before and recognize them. At the base syntax level that you are talking about, your "readability" is just your ability to recognize patterns. It's easier to recognize patterns the more you use it, so it's not surprising that what you do now is more "readable" to you. New styles seem odd at first, but eventually become more recognizable, and thus more "readable".
The example you give in your comments isn't hard to read because it lacks arrows. It's still hard to read with arrows:
$expr1->[$sub1{$x}]{$sub2[$y]-33*$x3}{24456+myFunct($abc)}
$expr1->[$sub1{$x}]->{$sub2[$y]-33*$x3}->{24456+myFunct($abc)}
I write that sort of code like this, using these sorts of variable names to remind the next coder about the sort of container each level is:
my $index = $sub1{$x};
my $key1 = $sub2[$y]-33*$x3;
my $key2 = 24456+myFunct($abc);
$expr1->[ $index ]{ $key1 }{ $key2 };
To make that even better, hide the details in a subroutine (that's what they are there for :) so you never have to play with that mess of a data structure directly. This is more readable that any of them:
my $value = get_value( $index, $key1, $key2 );
my $value = get_value(
$sub1{$x},
$sub2[$y]-33*$x3,
24456+myFunct($abc)
);
Since the -> arrow is non-optionally used for method calls, I prefer to only use it to call code. So I would use the following:
$object->method;
$coderef->();
$$dispatch{name}->();
$$arrayref[1];
$$arrayref[1][5];
#$arrayref[1 .. 5];
#$arrayref;
$$hashref{foo};
$$hashref{foo}{bar};
#$hashref{qw/foo bar/};
%$hashref;
Two sigils back to back always means a dereference, and the structure remains consistent across all forms of dereferencing (scalar, slice, all).
It also keeps all parts of the variable "together" which I find more readable, and it's shorter :)
I have always written all of the arrows. I agree with you, they separate better the different subscripts. Plus I use curly braces for regular expressions, so to me {foo}{bar} is a substitution: s{foo}{bar} stands out more from $s->{foo}->{bar} than from $s->{foo}{bar}.
I don't think it's a big thing though, reading code that omits the extra arrows is not a problem (as opposed to any indentation that's not the one I use ;--)
Looking through the perlsub and perlop manpages I've noticed that there are many references to "magic" and "magical" there (just search any of them for "magic"). I wonder why is Perl so rich in them.
Some examples:
print ++($foo = 'zz') # prints 'aaa'
printf "%d: %s", $! = 1, $! # prints '1: Operation not permitted'
while (my $line = <FH>) { ... } # $line is tested for definedness, not truth
use warnings; print "0 but true" + 1 # "0 but true" is a valid number!
When a Perl feature is described as "magic":
It means that that feature is
implemented by NBA star Magic Johnson.
Whenever Perl executes "magic", it is
actually sending an RPC call to a
remote receiver implanted in Magic
himself. He computes the answer, and
then sends a return message. The use
of Mr. Johnson for all the hard parts
of Perl provides a great abstraction
layer and simplifies porting to new
platforms. It's way easier than, say,
the Apache Portable Runtime.
Source: perrin on Perl Monks
It's official! Perl is more magical.
Hits from the following Google searches:
25 site:ruby-doc.org magic
36 site:docs.python.org magic
497 site:perldoc.perl.org magic
Magic, in Perl parlance is simply the word given to attributes applied to variables / functions that allow an extension of their functionality. Some of this functionality is available directly from Perl, and some requires the use of the C api.
A perfect example of magic is the tie interface which allows you to define your own implementation of a variable. Every operation that can be done to a variable (fetching or storing a value for instance) is exposed for reimplementation, allowing for elegant and logical syntactic constructs like a hash with values stored on disk, which are transparently loaded and saved behind the scenes.
Magic can also refer to the special ways that certain builtins can behave, such as how the first argument to map or grep can either be a block or a bare expression:
my #squares = map {$_**2} 1 .. 10;
my #roots = map sqrt, 1 .. 10;
which is not a behavior available to user defined subroutines.
Many other features of Perl, such as operator overloading or variables that can return different values when used with numeric or string operators are implemented with magic. Context could be seen as magic as well.
In a nutshell, magic is any time that a Perl construct behaves differently than a naive interpretation would suggest, an exception to the rule. Magic is of course very powerful, and should not be wielded without great care. Magic Johnson is of course involved in the execution of all magic (see FM's answer), but that is beyond the scope of this explaination.
I wonder why is Perl so rich in them.
To make things easy.
You'll find that most "magic" in Perl is to simplify the syntax for common tasks.
Because perl always Does What I Mean for some values of always.
I think (opinion more than fact) that this has to do with the organic growth viewpoint that Perl's creator Larry Wall has with the Perl language. Python is a study in the opposite approach, whose style often makes Perl hackers cringe at the perception of being forced to conform to a stylistic regime.
Some of it has to do with Perl being designed to be "efficient" at writing quick scripts to do Perl*-ish* tasks, in both wall clock time, and in keystrokes. Some of it has to do with the TMTOWTDI mantra of Perl and its followers.
Programmers tend to be opinionated about Perl's frequent usage of "magic", for some it is an eye-straining visual cacophony of chaos and disrespect for orderliness (which harkens back to the days of computer Priesthood in white lab coats behind a glass window), for others it is a shining example of getting things done efficiently, if not always obviously to the novice or outsider.
Perl's design philosophy is that simple things must be simple. This sounds good,and to some extent it is. However, there's a tradeoff involved: Making every simple thing a one-liner results in tons of special case hacks to save a few lines of code. Different people have different preferences regarding making simple operations within a language simple versus making the language specification simple. Perl is at one extreme. Java is at the other, at least among languages that people actually use. Python and C# are somewhere in between.