Do any programming languages support the ≥ character, Unicode U+2265, to mean greater than or equal to? - unicode

It is common in programming to use something like >= as the comparison operator meaning "is greater than or equal to". Unicode includes at least one specific character that represents this concept: ≥ (U+2265). Are there any programming languages that accept this character for use as a comparison operator?
Note: There probably exist programming languages in which a user can define ≥ as a comparison operator. That's certainly interesting, but I am asking about cases where the base language, some standard distribution of it, or some widely used package or library already has it so defined.

I don't know where you would find a comprehensive list of languages that support Unicode operators. One of the most well known is Julia, which is typically used for computational analysis.
There are a few others, like Perl 6 (now known as Raku) which has some support for it, along with Scala.
Many modern languages support Unicode in identifiers but do not natively implement Unicode operators. It's largely unnecessary since most IDEs also support font-ligatures: a feature that renders operators like >= as ≥.

Related

Mathematical formula terms in Scala

Our application relies on lots of equations, which, to correspond with the standard scientific names, use variable names like mu_k, (if the standard is $\mu_k$). (We could debate whether scientists should switch to CS style descriptive variable names, but often the terms don't really describe anything, they are just part of equations, and, more over, we need our code to match the known literature.)
In C this is easy to name vars this way: int mu_k. We are considering porting our code to Scala, but I know that val mu_k is discouraged in Scala, because underscores have special meanings.
If we use underscores only in the middle of the var name (e.g. mu_k) and not beginning or end (e.g. _x or x_), will this present a problem in Scala?
What is the recommended naming convention for Scala in this case?
You are right that underscores are discouraged in variable names in Scala, which implies that they are not forbidden. In my opinion, a convention should be followed wherever sensible.
In the case of mathematical formulae, I disagree that the Greek letters don't convey a meaning; the meaning is not necessarily intuitively descriptive for non-mathematicians, but as you say, the reference to the usage in a paper may be meaningful and important. Therefore, sticking with the underscore won't hurt, although I would probably prefer a more Scala-style way as muX when possible and meaningful. If you want a perfect answer, you might need to perform a usability test with your developers. In the specific example, I personally find mu_x more readable than muX, but that might differ among individuals.
I don't think the Scala compiler has a problem with underscores in the examples you described. Presumably, even leading and trailing underscores are fine, but should indeed be avoided strictly because they have a special meaning: http://docs.scala-lang.org/style/naming-conventions.html#methods.
Underscores are not special in any way in identifiers. There are a lot of special meanings for the underscore in Scala, but not in identifiers. (There is a special rule in identifiers that if you want to mix alphanumeric characters and operator characters in the same identifier, they have to be separated by an underscore, e.g. foo? is not a legal identifier, but foo_? is.)
So, there is no problem using an identifier with an underscore in it.
It is generally preferred to use camelCase and PascalCase for alphanumeric identifiers, and not mix alphanumeric and operator characters in the same identifier (i.e. use maxBy instead of max_by and use isFoo instead of foo_?) but that's just a coding convention whose purpose is to reduce the number of "unspecial" underscores, so that you can quickly scan for the "special" ones.
But in your case, you are using special naming conventions anyway, so you don't need to adhere to the community naming conventions as strictly.
However, I personally would actually prefer the name µ_k over mu_k.
That's as far as it goes with Scala, unfortunately. The Fortress programming language by Sun/Oracle did allow boldface, overstrike, superscripts and subscripts in identifier names, so something like µk would have been possible as a legal identifier, but sadly, Fortress was abandoned a couple of years ago.
I'm not stating this is the correct way, and myself would be rather discouraged to do this, but you can use full string literals as identifiers:
From: http://www.scala-lang.org/files/archive/spec/2.11/01-lexical-syntax.html
id ::= plainid
| ‘’ stringLiteral ‘’
Finally, an identifier may also be formed by an arbitrary string
between back-quotes (host systems may impose some restrictions on
which strings are legal for identifiers). The identifier then is
composed of all characters excluding the backquotes themselves.
So this is valid:
val ’mu k‘
(sorry, for formatting)

Which langages let you use fully customised lexems, including keywords and all symbol defined in their grammar?

I wish to code fully with Esperanto lexemes. That is, not ending up with a English/Esperanto mix up. Perligata is a good example of the kind of result I would like, but it use Latin where I want to use Esperanto.
So Perl seems to be a valid answer to my question. On the other end a language like Python have no mechanism that would let you use se (if in esperanto) rather than if. On what you may call middle ground, you have languages like C that allow to replace keywords through its processor (#define se if), but won't allow you to get ride of the define keywords itself. Also you have languages like racket and the LISP-family that would probably let you use wrap most internal symbols, but probably not allow you to easily change parentheses for anything else. For example mapping ( with ene and ) with ele.
Also an other point is ability to use unicode in identifiers, as Esperanto do use non-ASCII characters in its alphabet, like ĉ. That's not really a blocking element, as one is available to use cx instead of ĉ, but its nevertheless an interesting parameter.
So I guess an ideal answer to this question would be a matrix of languages specifying their lexeme and grammar customability.
each formal language has its syntax. in my opinion, lowest 'syntax overhead' is offered by lisp-like languages. but then you don't want to have parenthesis. you also don't want to have #define therefore you reject any syntax at all and all possible replacements.
i don't think there is any language that will let you do it. you should look for language generators, write your own language (at least the syntax part) or the simplest possible way, add your own find-replace layer on top of any existing language

Programming in a language other than English

I was having a discussion on twitter about adding the ability of Ruby to use λ instead of lambda, and more generally about Unicode support. I realized that all the languages I know work only with English reserved words and mostly assume a us-en keyboard (for example using $ instead of £ or ¥). While some languages are now starting to have some support for Unicode in there string functions, there are still so many convention based on English or the Latin style character set. For example Ruby requires class names begin with an upper case letter, but upper and lower case is not a property of glyphs in most scripts.
So the question is: "Are there programming languages that work in a large set of languages, and how do they do it?"
You can have a look ant the APL programming language, for example.
Some languages define very simple syntaxes and little or no keyworks. For example, LISPs and languages that function like them (Tcl, etc...) where everything is "command arg1 ... argn". These languages, since there are no keywords per se, are language agnostic.
For example, in Tcl, you can rename the various commands to use whatever language you want and everything should work perfectly.
Python 3 is completely Unicode-based, so identifiers can be constructed out of any Unicode letters/digits etc.
It's still not a good idea to use characters for function names that programmers from other nations don't have easy access to on their keyboards.
In the 3.0.0 release of the Parrot VM, they added support for a language, Ωη;)XD that is named using unicode which caused all kinds of breakage for the VM. It might be worth taking a look at.

Which programming languages were designed with Unicode support from the beginning?

Which widely used programming languages were designed ground-up with Unicode support?
A lot of programming languages have added Unicode support as an afterthought in later versions, but which widely used languages were released with Unicode support from day one?
Java was probably the first popular language to have ground-up Unicode support.
Basically all of the .NET languages are Unicode languages, such as C# and VB.NET.
There were many breaking changes in Python 3, among them the switch to Unicode for all text.
So Python wasn't designed ground-up for Unicode, but Python 3 was.
I don't know how far this goes in other languages, but a fun thing about C# is that not only is the runtime (the string class etc) unicode aware - but unicode is fully supported in source:
using משליט = System.Object;
using תוצאה = System.Int32;
public class שלום : משליט {
public תוצאה בית() {
int אלף = 0;
for (int λ = 0; λ < 20; λ++) אלף+=λ;
return אלף;
}
}
Google's Go programming language supports Unicode and works with UTF-8.
It really is difficult to design Unicode support for the future, in a programming language right from the beginning.
Java is one one of the languages that had this designed into the language specification. However, Unicode support in v1.0 of Java is different from v5 and v6 of the Java SDK. This is primarily due to the version of Unicode that the language specification catered to, when the language was originally designed. Java attempts to track changes in the Unicode standard with every major release.
Early implementations of the JLS could claim Unicode support, primarily because Unicode itself supported 65536 characters (v1.0 of Java supported Unicode 1.1, and Java v1.4 supported Unicode 3.0) which was compatible with the 16-bit storage space taken up by characters. That changed with Unicode 3.1 - its an evolving standard, usually with more characters getting added in each release. The characters added later in 3.1 were called supplementary characters. Support for supplementary characters were added in Java 5 via JSR-204; Java 5 and 6 support Unicode 4.0.
Therefore, don't be surprised if different programming languages implement Unicode support differently.
On the other hand, PHP(!!) and Ruby did not have Unicode support built into them during inception.
PS: Support for v5.1 of Unicode is to be made in Java 7.
Java and the .NET languages, as other commenters have pointed out, although Java's strings are UTF-16 rather than UCS or UTF-8. (At the time, it seemed like a sensible idea! Now clearly either UTF-8 or UCS would be better.) And Python 3 is really a different, incompatible language from Python 1.x and 2.x, so it qualifies too.
The Plan9 languages around 1992 were probably the first to do this: their dialect of C, rc, Alef, mk, ACID, and so on, were all Unicode-enabled. They took the very simple approach that anything that wasn't ASCII was an identifier character. See their paper from 1993 on the subject. (This is the project where UTF-8 was invented, which meant they could do this in a pretty compatible way, in particular without plumbing binary-versus-text through all their programs.)
Other languages that support non-ASCII identifiers include current PHP.
Perl 6 has complete unicode support from scratch.
(With the Rakudo Perl 6 compiler being the first implementation)
General overview
Unicode operators
Strings, Regular expressions and grammars all operate based on graphemes, even for those codepoint combination for which there is no composed representation (a composed representation artificial codepoint is generated on the fly for those cases).
A special encoding exists to handle data of unknown encoding "utf8-c8": this assumes utf-8 when possible, but creates artificial codepoints for unencodable sequences, allowing them to roundtrip if necessary.
Python 3.x: http://docs.python.org/dev/3.0/whatsnew/3.0.html
Sometimes, a feature that was included in a language when it was first designed is not always the best.
Languages have changed over time and many have become bloated with extra features, while not necessarily keeping up-to-date with the features it first included.
So I just throw out the idea that you shouldn't necessarily discount languages that have recently added Unicode. They will have the advantage of adding Unicode to an already mature development tool, and getting the chance to do it right the first time.
With that in mind, I want to ensure that Delphi is included here, as one of your answers. Embarcadero added Unicode in their Delphi 2009 version and did a mighty fine job on it. It was enough to finally prompt me to upgrade from the Delphi 4 that I had been using for 10 years.
Java uses characters from the Unicode character set.
java and .net languages

If Ascii operators are definable, why not Unicode Symbols?

I'm sure I join many in being glad there's finally a powerful language tied tightly to a mainstream GUI/Database/Communication framework.
I haven't been sure where to post this, but here seems the best spot.
I need to use Unicode symbol characters either as operators or as function names. I'd like syntactic sugar, but I don't need it.
Guy Steele pointed out in Communications of the ACM that "*" was a forced choice when it was adopted from Ascii as multiply, but my software works in Unicode, so I'm not tethered to Ascii anymore.
!$%&*+-./<=>?,#^|~:
Part of localization includes local programmers. Why limit the set of operators that can be defined in F#? It isn’t orthogonal to C#'s and F#'s acceptance of many Unicode IsLetter in identifiers.
Also, F# is likely to be used for symbolic manipulation of problems from logic, math, physicists, etc. It makes work much easier if there’s a direct mapping into the language of the basic operators. (F# and C# accept many Unicode IsLetter? as well as IsDigit’? This is a request to allow Unicode IsSymbol? As operators with the precedence of, for example, *, or, since “+” is both a unary and binary operator, I could put up with the precedence of + and make up the difference with parenthesized groupings.
Consider the domain-specific needs of logicians, mathematicians, physicists, etc. I’d rather write a symbolic differentiator or integrator using math symbols than Ascii permutations of already-taken operators.
Logic: ∀ ∃ ⇒
Math: ∑ ∫ ∂
Group theory: ≤ ≥ ∈ ∉
Set Theory: ⊆ ⊇ ⊃ ∪ ∩
Tensors: ⊗
I’ve written many languages in other languages, but because F# is tightly .Net-integrated, this issue poses special challenges without language support:
It’s trivial to cobble up a translator that takes Unicode-operator F# source and maps it, line-by-line, to Ascii-operator F# source.
But when debugging, how do I make sure the programmer still sees their untranslated source? And that they can see variable values.
Operators and converts them is trivial. But how do I ensure the translation is what gets compiled, while the programmer sees their own source? If I map line-for-line correctly, how do I ensure they can still point at a variable and see its value?
There is a math (Unicode) symbol extension for F# available in the Visual Studio Gallery.
This allows you to define Unicode symbols, e.g.:
let inline (~∑) xs = xs |> Seq.sum
let total = ∑myList
You may be interested in Project Fortress which is a new functional programming language that embraces the Unicode character set (among many other features). In particular, see the Mathematical Syntax in Fortress page which contains some sample code.
For an interesting discussion on this check: http://cs.hubfs.net/forums/thread/9690.aspx
Other languages, such as Scala, do permit operators from outside the ASCII range -- mathematical symbols(Sm) and other symbols(So)