Perl functions naming convention for long names - perl

I have read some articles online on the naming convention for Perl style which suggest using lowercase letters and separating words by underscore for functions or methods names. Some others use the first word in lowercase then capitalize the other words. Of course Windows .NET etc Capitalize every word and no underscore.
I have some packages methods many words like entriesoncurrentpage, if I follow Perl style suggested I should do it like:
sub entries_on_current_page {...}
this added four underscore letters to the method name, the other style is:
sub entriesOnCurrentPage {...}
and Windows style should be:
sub EntriesOnCurrentPage {...}
PHP sometimes uses all lowercase with underscore like mysql_real_escape_string() and sometimes uses all lowercase without underscore like htmlspecialchars, of course PHP function names are not case sensitive so this feature is not supported in Perl.
So the question is, for the long name with many words what is the best style to use for Perl coding.

Originally, most Perl developers used camel casing with the first letter lowercased. This is the standard with most programming languages. Names with first letter capitalized were used for classes and methods.
Later on, Damian Conway's book Perl Best Practices suggested using underscores rather than camel casing. Damian argued that it increased readability, and was not that much harder to type.
Damian Conway's suggestion on names became the standard because 1). He was correct. It's much more legible and isn't that much harder to type, and most importantly 2). It was incorporated into Perltidy. Perltidy is a program that helps standardize your code according to Damian's suggestions. Perltidy is much like CheckStyle in Java.
Are these arbitrary standards? Yes, all standards are somewhat arbitrary in nature. You have a few candidate suggestions for rules, and you must make a decision:
Should the curly brace in while loops and if statements be appended on the end of the line, or go under the while or if statement. In standerd C style, curly braces are cuddled. In Java, they're not suppose to be according to CheckStyle. In Kornshell, the then goes under the if. In Bash, the standard is now that the then goes on the same line even though the Bash interpreter doesn't really like it. (You have to add a semicolon before the then because it's considered a separate command.
How should variable names be done. In most languages, CamelCase rules. In .NET, you even capitalize the first character, but in Perl, we use underscores.
Should constants be all uppercase? Most languages have agreed with that. However, in shell script, you usually reserve all uppercase variable names for special environment variables such as $PWD, $PATH, etc. Therefore, in Bash and Kornshell scripts, constant variables are all camelCased like regular variables.
The idea is to follow the standard for that language. Why? Because the standard says so. Because you can't argue with The Standard as you can with your fellow programmers whether or not curly braces are cuddled or not. The main this is to realize that most standards may be somewhat arbitrary, but they don't really affect the way you program. By everyone following a standard, you make it easier to understand other people's code.

Related

Name of the notation that goes like /<command> [arg0|arg1]

I know there is a notation or convention that for example describes the usage of a command (in a shell for example).
/<command> [arg0|arg1]
means the following is a right way of expressing/using the command: /TheNameOfTheCommand arg0 or /TheNameOfTheCommand arg1.
It is a bit like RegEx or a formal language. Minecraft uses this notation too to describe the syntax of their command. And I once heard about it by a professor in a programming lecture. That's the reason I think it must be a real convention.
Do you know the name of this convention or does it exist at all?
It's a convention, but not a standard. Or maybe it would be more accurate to say that it is a convention adapted from a collection of standards which differ in details, except that the standards derive from the conventional use, and it's more common to find other variants than strict application of the standards.
The use of angle brackets to delimit grammatical variables goes back to Peter Backus' notation to describe the original Algol (1959); the use of brackets to donate optionality and vertical bars to list alternatives was used in the definition of Pascal (1974) and promoted by Niklaus Wirth in a note published in 1977 (“What Can We Do About the
Unnecessary Diversity of Notation for
Syntactic Definitions”).
In the Pascal report, it was called "Extended Backus Naur Form", and it is one of a number of similar notations which go by that name. I think that's unfortunate because it doesn't acknowledge Wirth's contribution but if you called it WBNF people would probably think you were talking about a radio station.
(Wirth didn't always use angle brackets. In the published version of the Pascal report, grammatical variables were printed on italics, but in the widely-distributed typed manuscript, angle brackets were used. Similarly, literal tokens were sometimes typeset in boldface and sometimes typed surrounded by quotation marks.)

Mathematical formula terms in Scala

Our application relies on lots of equations, which, to correspond with the standard scientific names, use variable names like mu_k, (if the standard is $\mu_k$). (We could debate whether scientists should switch to CS style descriptive variable names, but often the terms don't really describe anything, they are just part of equations, and, more over, we need our code to match the known literature.)
In C this is easy to name vars this way: int mu_k. We are considering porting our code to Scala, but I know that val mu_k is discouraged in Scala, because underscores have special meanings.
If we use underscores only in the middle of the var name (e.g. mu_k) and not beginning or end (e.g. _x or x_), will this present a problem in Scala?
What is the recommended naming convention for Scala in this case?
You are right that underscores are discouraged in variable names in Scala, which implies that they are not forbidden. In my opinion, a convention should be followed wherever sensible.
In the case of mathematical formulae, I disagree that the Greek letters don't convey a meaning; the meaning is not necessarily intuitively descriptive for non-mathematicians, but as you say, the reference to the usage in a paper may be meaningful and important. Therefore, sticking with the underscore won't hurt, although I would probably prefer a more Scala-style way as muX when possible and meaningful. If you want a perfect answer, you might need to perform a usability test with your developers. In the specific example, I personally find mu_x more readable than muX, but that might differ among individuals.
I don't think the Scala compiler has a problem with underscores in the examples you described. Presumably, even leading and trailing underscores are fine, but should indeed be avoided strictly because they have a special meaning: http://docs.scala-lang.org/style/naming-conventions.html#methods.
Underscores are not special in any way in identifiers. There are a lot of special meanings for the underscore in Scala, but not in identifiers. (There is a special rule in identifiers that if you want to mix alphanumeric characters and operator characters in the same identifier, they have to be separated by an underscore, e.g. foo? is not a legal identifier, but foo_? is.)
So, there is no problem using an identifier with an underscore in it.
It is generally preferred to use camelCase and PascalCase for alphanumeric identifiers, and not mix alphanumeric and operator characters in the same identifier (i.e. use maxBy instead of max_by and use isFoo instead of foo_?) but that's just a coding convention whose purpose is to reduce the number of "unspecial" underscores, so that you can quickly scan for the "special" ones.
But in your case, you are using special naming conventions anyway, so you don't need to adhere to the community naming conventions as strictly.
However, I personally would actually prefer the name µ_k over mu_k.
That's as far as it goes with Scala, unfortunately. The Fortress programming language by Sun/Oracle did allow boldface, overstrike, superscripts and subscripts in identifier names, so something like µk would have been possible as a legal identifier, but sadly, Fortress was abandoned a couple of years ago.
I'm not stating this is the correct way, and myself would be rather discouraged to do this, but you can use full string literals as identifiers:
From: http://www.scala-lang.org/files/archive/spec/2.11/01-lexical-syntax.html
id ::= plainid
| ‘’ stringLiteral ‘’
Finally, an identifier may also be formed by an arbitrary string
between back-quotes (host systems may impose some restrictions on
which strings are legal for identifiers). The identifier then is
composed of all characters excluding the backquotes themselves.
So this is valid:
val ’mu k‘
(sorry, for formatting)

Which langages let you use fully customised lexems, including keywords and all symbol defined in their grammar?

I wish to code fully with Esperanto lexemes. That is, not ending up with a English/Esperanto mix up. Perligata is a good example of the kind of result I would like, but it use Latin where I want to use Esperanto.
So Perl seems to be a valid answer to my question. On the other end a language like Python have no mechanism that would let you use se (if in esperanto) rather than if. On what you may call middle ground, you have languages like C that allow to replace keywords through its processor (#define se if), but won't allow you to get ride of the define keywords itself. Also you have languages like racket and the LISP-family that would probably let you use wrap most internal symbols, but probably not allow you to easily change parentheses for anything else. For example mapping ( with ene and ) with ele.
Also an other point is ability to use unicode in identifiers, as Esperanto do use non-ASCII characters in its alphabet, like ĉ. That's not really a blocking element, as one is available to use cx instead of ĉ, but its nevertheless an interesting parameter.
So I guess an ideal answer to this question would be a matrix of languages specifying their lexeme and grammar customability.
each formal language has its syntax. in my opinion, lowest 'syntax overhead' is offered by lisp-like languages. but then you don't want to have parenthesis. you also don't want to have #define therefore you reject any syntax at all and all possible replacements.
i don't think there is any language that will let you do it. you should look for language generators, write your own language (at least the syntax part) or the simplest possible way, add your own find-replace layer on top of any existing language

Programming in a language other than English

I was having a discussion on twitter about adding the ability of Ruby to use λ instead of lambda, and more generally about Unicode support. I realized that all the languages I know work only with English reserved words and mostly assume a us-en keyboard (for example using $ instead of £ or ¥). While some languages are now starting to have some support for Unicode in there string functions, there are still so many convention based on English or the Latin style character set. For example Ruby requires class names begin with an upper case letter, but upper and lower case is not a property of glyphs in most scripts.
So the question is: "Are there programming languages that work in a large set of languages, and how do they do it?"
You can have a look ant the APL programming language, for example.
Some languages define very simple syntaxes and little or no keyworks. For example, LISPs and languages that function like them (Tcl, etc...) where everything is "command arg1 ... argn". These languages, since there are no keywords per se, are language agnostic.
For example, in Tcl, you can rename the various commands to use whatever language you want and everything should work perfectly.
Python 3 is completely Unicode-based, so identifiers can be constructed out of any Unicode letters/digits etc.
It's still not a good idea to use characters for function names that programmers from other nations don't have easy access to on their keyboards.
In the 3.0.0 release of the Parrot VM, they added support for a language, Ωη;)XD that is named using unicode which caused all kinds of breakage for the VM. It might be worth taking a look at.

Are quotes around hash keys a good practice in Perl?

Is it a good idea to quote keys when using a hash in Perl?
I am working on an extremely large legacy Perl code base and trying to adopt a lot of the best practices suggested by Damian Conway in Perl Best Practices. I know that best practices are always a touchy subject with programmers, but hopefully I can get some good answers on this one without starting a flame war. I also know that this is probably something that a lot of people wouldn't argue over due to it being a minor issue, but I'm trying to get a solid list of guidelines to follow as I work my way through this code base.
In the Perl Best Practices book by Damian Conway, there is this example which shows how alignment helps legibility of a section of code, but it doesn't mention (anywhere in the book that I can find) anything about quoting the hash keys.
$ident{ name } = standardize_name($name);
$ident{ age } = time - $birth_date;
$ident{ status } = 'active';
Wouldn't this be better written with quotes to emphasize that you are not using bare words?
$ident{ 'name' } = standardize_name($name);
$ident{ 'age' } = time - $birth_date;
$ident{ 'status' } = 'active';
Without quotes is better. It's in {} so it's obvious that you are not using barewords, plus it is both easier to read and type (two less symbols). But all of this depends on the programmer, of course.
When specifying constant string hash keys, you should always use (single) quotes. E.g., $hash{'key'} This is the best choice because it obviates the need to think about this issue and results in consistent formatting. If you leave off the quotes sometimes, you have to remember to add them when your key contains internal hypens, spaces, or other special characters. You must use quotes in those cases, leading to inconsistent formatting (sometimes unquoted, sometimes quoted). Quoted keys are also more likely to be syntax-highlighted by your editor.
Here's an example where using the "quoted sometimes, not quoted other times" convention can get you into trouble:
$settings{unlink-devices} = 1; # I saved two characters!
That'll compile just fine under use strict, but won't quite do what you expect at runtime. Hash keys are strings. Strings should be quoted as appropriate for their content: single quotes for literal strings, double quotes to allow variable interpolation. Quote your hash keys. It's the safest convention and the simplest to understand and follow.
I never single-quote hash keys. I know that {} basically works like quotes do, except in special cases (a +, and double-quotes). My editor knows this too, and gives me some color-based cues to make sure that I did what I intended.
Using single-quotes everywhere seems to me like a "defensive" practice perpetrated by people that don't know Perl. Save some keyboard wear and learn Perl :)
With the rant out of the way, the real reason I am posting this comment...the other comments seem to have missed the fact that + will "unquote" a bareword. That means you can write:
sub foo {
$hash{+shift} = 42;
}
or:
use constant foo => 'OH HAI';
$hash{+foo} = 'I AM A LOLCAT';
So it's pretty clear that +shift means "call the shift function" and shift means "the string 'shift'".
I will also point out that cperl-mode highlights all of the various cases correctly. If it doesn't, ping me on IRC and I will fix it :)
(Oh, and one more thing. I do quote attribute names in Moose, as in has 'foo' => .... This is a habit I picked up from working with stevan, and although I think it looks nice... it is a bit inconsistent with the rest of my code. Maybe I will stop doing it soon.)
Quoteless hash keys received syntax-level attention from Larry Wall to make sure that there would be no reason for them to be other than best practice. Don't sweat the quotes.
(Incidentally, quotes on array keys are best practice in PHP, and there can be serious consequences to failing to use them, not to mention tons of E_WARNINGs. Okay in Perl != okay in PHP.)
I don't think there's a best practice on this one. Personally I use them in hash keys like so:
$ident{'name'} = standardize_name($name);
but don't use them to the left of the arrow operator:
$ident = {name => standardize_name($name)};
Don't ask me why, it's just the way I do it :)
I think the most important thing you can do is to always, always, always:
use strict;
use warnings;
That way the compiler will catch any semantic errors for you, leaving you less likely to mistype something, whichever way you decide to go.
And the second most important thing is to be consistent.
I go without quotes, just because it's less to type and read and worry about. The times when I have a key which won't be auto-quoted are few and far between so as not to be worth all the extra work and clutter. Perhaps my choice of hash keys have changed to fit my style, which is just as well. Avoid the edge cases entirely.
It is sort of the same reason I use " by default. It's more common for me to plop a variable in the middle of a string than to use a character that I don't want interpolated. Which is to say, I've more often written 'Hello, my name is $name' than "You owe me $1000".
At least, quoting prevent syntax highlighting reserved words in not-so-perfect editors. Check out:
$i{keys} = $a;
$i{values} = [1,2];
...
I prefer to go without quotes, unless I want some string interpolation. And then I use double quotes. I liken it to literal numbers. Perl would really allow you to do the following:
$achoo['1'] = 'kleenex';
$achoo['14'] = 'hankies';
But nobody does that. And it doesn't help with clarity, simply because we add two more characters to type. Just like sometimes we specifically want slot #3 in an array, sometimes we want the PATH entry out of %ENV. Single-quoting it add no clarity as far as I'm concerned.
The way Perl parses code makes it impossible to use other types of "bare words" in a hash index.
Try
$myhash{shift}
and you're only going to get the item stored in the hash under the 'shift' key, you have to do this
$myhash{shift()}
in order to specify that you want the first argument to index your hash.
In addition, I use jEdit, the ONLY visual editor (that I've seen--besides emacs) that allows you total control over highlighting. So it's doubly clear to me. Anything looking like the former gets KEYWORD3 ($myhash) + SYMBOL ({) + LITERAL2 (shift) + SYMBOL (}) if there is a paranthesis before the closing curly it gets KEYWORD3 + SYMBOL + KEYWORD1 + SYMBOL (()}). Plus I'll likely format it like this as well:
$myhash{ shift() }
Go with the quotes! They visually break up the syntax and more editors will support them in the syntax highlighting (hey, even Stack Overflow is highlighting the quote version). I'd also argue that you'd notice typos quicker with editors checking that you ended your quote.
It is better with quotes because it allows you to use special characters not permitted in barewords. By using quotes I can use the special characters of my mother tongue in hash keys.
I've wondered about this myself, especially when I found I've made some lapses:
use constant CONSTANT => 'something';
...
my %hash = ()
$hash{CONSTANT} = 'whoops!'; # Not what I intended
$hash{word-with-hyphens} = 'whoops!'; # wrong again
What I tend to do now is to apply quotes universally on a per-hash basis if at least one of the literal keys needs them; and use parentheses with constants:
$hash{CONSTANT()} = 'ugly, but what can you do?';
You can precede the key with a "-" (minus character) too, but be aware that this appends the "-" to the beginning of your key. From some of my code:
$args{-title} ||= "Intrig";
I use the single quote, double quote, and quoteless way too. All in the same program :-)
I've always used them without quotes but I would echo the use of strict and warnings as they pick out most of the common mistakes.