The Unicode is allowed in identifiers in backticks
val `💾id` = "1"
But slash is not allowed
val `application/json` = "application/json"
In Scala we can have such names.
This is a JVM limitation. From the specification section 4.2.2:
Names of methods, fields, local variables, and formal parameters are stored as unqualified names. An unqualified name must contain at least one Unicode code point and must not contain any of the ASCII characters . ; [ / (that is, period or semicolon or left square bracket or forward slash).
In Scala names are mangled to avoid this limitation, in Kotlin they are not.
Kotlin's identifiers are used as-is, without any mangling, in the names of JVM classes and methods generated from the Kotlin code. The slash has a special meaning in JVM names (it separates packages and class names). Therefore, Kotlin doesn't allow using it in an identifier.
Related
I haven't been able to find in Drools documentation, which characters (beyond alphabet letters) are allowed/disallowed in a rule name in Drools - does anyone know or have a reference?
The only relevant section of Drools doc I've found so far does not specify:
Each rule must have a unique name within the rule package. If you use the same rule name more than once in any DRL file in the package, the rules fail to compile. Always enclose rule names with double quotation marks (rule "rule name") to prevent possible compilation errors, especially if you use spaces in rule names.
I think I have discovered, anecdotally, that some "grouping" characters do not work in rule names (seems rules named with can't be found or aren't included) - or at least, in extension rules (the extended rule seems to work with grouping chars, but not its extension; example below): The grouping chars include parentheses "()", square brackets "[]", and "curly braces" "{}". Although less than & greater than "<>" work, so I'm so far replacing the former with the latter.
Or are there escape chars for the problematic grouping chars?
Example:
rule "(grouping chars, and commas, work here)"
when
// conditions LHS
then
end
// removing parentheses, or replacing with < >,
// from below line works
rule "(grouping chars DON'T work here)"
extends "(grouping chars, and commas, work here)"
when
then
// consequences RHS
I haven't discovered either way yet with all other characters (for example, other punctuation; except I have discovered commas "," work). But it would be nice to know ahead of time what characters are allowed.
Theoretically every identifier inside a string should work, but you might have empirically found some combination that is breaking the grammar somehow.
Thanks for the investigation, I've filled a Jira, please take a look at it
Our application relies on lots of equations, which, to correspond with the standard scientific names, use variable names like mu_k, (if the standard is $\mu_k$). (We could debate whether scientists should switch to CS style descriptive variable names, but often the terms don't really describe anything, they are just part of equations, and, more over, we need our code to match the known literature.)
In C this is easy to name vars this way: int mu_k. We are considering porting our code to Scala, but I know that val mu_k is discouraged in Scala, because underscores have special meanings.
If we use underscores only in the middle of the var name (e.g. mu_k) and not beginning or end (e.g. _x or x_), will this present a problem in Scala?
What is the recommended naming convention for Scala in this case?
You are right that underscores are discouraged in variable names in Scala, which implies that they are not forbidden. In my opinion, a convention should be followed wherever sensible.
In the case of mathematical formulae, I disagree that the Greek letters don't convey a meaning; the meaning is not necessarily intuitively descriptive for non-mathematicians, but as you say, the reference to the usage in a paper may be meaningful and important. Therefore, sticking with the underscore won't hurt, although I would probably prefer a more Scala-style way as muX when possible and meaningful. If you want a perfect answer, you might need to perform a usability test with your developers. In the specific example, I personally find mu_x more readable than muX, but that might differ among individuals.
I don't think the Scala compiler has a problem with underscores in the examples you described. Presumably, even leading and trailing underscores are fine, but should indeed be avoided strictly because they have a special meaning: http://docs.scala-lang.org/style/naming-conventions.html#methods.
Underscores are not special in any way in identifiers. There are a lot of special meanings for the underscore in Scala, but not in identifiers. (There is a special rule in identifiers that if you want to mix alphanumeric characters and operator characters in the same identifier, they have to be separated by an underscore, e.g. foo? is not a legal identifier, but foo_? is.)
So, there is no problem using an identifier with an underscore in it.
It is generally preferred to use camelCase and PascalCase for alphanumeric identifiers, and not mix alphanumeric and operator characters in the same identifier (i.e. use maxBy instead of max_by and use isFoo instead of foo_?) but that's just a coding convention whose purpose is to reduce the number of "unspecial" underscores, so that you can quickly scan for the "special" ones.
But in your case, you are using special naming conventions anyway, so you don't need to adhere to the community naming conventions as strictly.
However, I personally would actually prefer the name µ_k over mu_k.
That's as far as it goes with Scala, unfortunately. The Fortress programming language by Sun/Oracle did allow boldface, overstrike, superscripts and subscripts in identifier names, so something like µk would have been possible as a legal identifier, but sadly, Fortress was abandoned a couple of years ago.
I'm not stating this is the correct way, and myself would be rather discouraged to do this, but you can use full string literals as identifiers:
From: http://www.scala-lang.org/files/archive/spec/2.11/01-lexical-syntax.html
id ::= plainid
| ‘’ stringLiteral ‘’
Finally, an identifier may also be formed by an arbitrary string
between back-quotes (host systems may impose some restrictions on
which strings are legal for identifiers). The identifier then is
composed of all characters excluding the backquotes themselves.
So this is valid:
val ’mu k‘
(sorry, for formatting)
Just saw an example that looks like the following:
val b_* = grater[Book].asObject(dbo)
What is the significance of the asterisk in b_* here? What's the name for it in Scala and what affect does it have on the outcome of b_?
Asterisk is valid in scala variable and value names, as are many other characters that are not allowed in in identifier names in Java or other C-like languages. See Valid identifier characters in Scala for more info.
However, just because it can be done doesn't mean it should be done. To my eye, it's not obvious at all what this value represents.
First of all, as I understand it variable identifiers are called symbols in common lisp.
I noted that while in languages like C variable identifiers can only be alphanumberics and underscores, Common Lisp allows many more characters to be used like "*" and (at least scheme does) "?"
So, what I want to know is: what exactly is the full set of characters that Common Lisp allows to have in a symbol (or variable identifier if I'm wrong)? is that the same for Scheme?
Also, is the set of characters different for function names?
I've been googling, looking in the CLHS, and in Practical Common Lisp, and for the life of me, something must be wrong because I can't seem to find the answer.
A detailed answer is a bit tricky. There is the ANSI standard for Common Lisp. It defines the set of available characters. Basically you can use all those defined characters for symbols. See also Symbols as Tokens.
For example
|Polynom 2 * x ** 3 - 5 * x ** 2 + 10|
is a valid symbol. Note that the vertical bars mark the symbol and do not belong to the symbol name.
Then there are the existing implementations of Common Lisp and their support of various character sets and string types. So several support Unicode (or similar) and allow Unicode characters in symbol names.
LispWorks:
CL-USER 1 > (list 'δ 'ψ 'σ '\|)
(δ ψ σ \|)
[From a Schemer's perspective. Even though some concepts in Scheme and Common Lisp have the same name, it does not mean that the mean the same thing in the two languages.]
First note that symbols and identifiers are two different things.
Symbols can be thought of as strings which support fast equality comparision.
Two symbols s and t are equal (more or less) if they are spelled the same way. The operation string=? needs to loop over the characters in the and see if they are all alike. This take time proportional to the length of the shortest string. Symbols on the other hand are automatically (ny the runtime system) put into a (typically) hash table. Therefore symbol=? boils down to a simple pointer comparison and is thus very fast. Symbols are often used in cases where one in C would use enumerations.
Symbols are values that can be present at runtime.
Identifiers are simply names of variables in a program.
Now if said program is to be represented as a Scheme value, one choice would be to use symbols to represent identifiers - but that does not mean symbols are identifiers (or vice versa). A better representation of identifiers (still in Scheme) is syntax objects which besides the name of the identifier also records the where the identifier was read (or constructed). Say you encounter an undefined variable and want to signal where in the program the undefined variable is, then is very convenient that the source location is part of the representation of the identifier.
Last but not least. What are the legal characters of an identifer? Here it is best to quote chapter and version from R6RS:
4.2.4 Identifiers
Most identifiers allowed by other programming languages are also
acceptable to Scheme. In general, a sequence of letters, digits, and
“extended alphabetic characters” is an identifier when it begins with
a character that cannot begin a representation of a number object. In
addition, +, -, and ... are identifiers, as is a sequence of letters,
digits, and extended alphabetic characters that begins with the
two-character sequence ->. Here are some examples of identifiers:
lambda q soup
list->vector + V17a
<= a34kTMNs ->-
the-word-recursion-has-many-meanings
Extended alphabetic characters may be used within identifiers as if
they were letters. The following are extended alphabetic characters:
! $ % & * + - . / : < = > ? # ^ _ ~
Moreover, all characters whose Unicode scalar values are greater than
127 and whose Unicode category is Lu, Ll, Lt, Lm, Lo, Mn, Mc, Me, Nd,
Nl, No, Pd, Pc, Po, Sc, Sm, Sk, So, or Co can be used within
identifiers. In addition, any character can be used within an
identifier when specified via an <inline hex escape>. For
example, the identifier H\x65;llo is the same as the identifier
Hello, and the identifier \x3BB; is the same as the identifier
λ.
Any identifier may be used as a variable or as a syntactic keyword
(see sections 5.2 and 9.2) in a Scheme program. Any identifier may
also be used as a syntactic datum, in which case it represents a
symbol (see section 11.10).
From: http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_sec_4.2.4
See Chapter 2 of the CLHS, which describes the reader algorithm in detail. But the simple answer is that if a token isn't a readmacro invocation (section 2.4), and isn't a number or all dots, it defaults to being interpreted as a symbol.
From Programming in Scala section 6.10 (Page 151):
Identifiers in user programs should not contain '$' character, even though it will compile; if they do this might lead to name clashes with identifiers generated by Scala compiler.
I am sure it's a reason for this, but why not prevent use of the '$' character in alphanumeric identifiers?
Some of the identifiers generated internally by the Scala compiler contain '$' characters. If you create new identifiers with '$' characters, you might clash with the internally generated characters, and chaos ensues. OTOH, you sometimes need to '$' characters, either on those (now very rare) occasions when access to the internally generated Scala characters is necessary, or because someone used such an identifier in Java code you wish to call (where it's legal, if also discouraged).