Why does Scala choose to have the types after the variable names? - scala

In Scala variables are declared like:
var stockPrice: Double = 100.
Where the type (Double) follows the identifier (stockPrice). Traditionally in imperative languages such as C, Java, C#, the type name precedes the identifier.
double stock_price = 100.0;
Is it purely a matter of taste, or does having the type name in the end help the compiler in any way? Go also has the same style.

Kevin's got it right. The main observation is that the "type name" syntax works great as long as types are short keywords such as int or float:
int x = 1
float d = 0.0
For the price of one you get two pieces of information: "A new definition starts here", and "here's the (result) type of the definition". But we are way past the area of simple primitive types nowadays. If you write
HashMap<Shape, Pair<String, String>> shapeInfo = makeInfo()
the most important part of what you define (the name) is buried behind the type expression. Compare with
val shapeInfo: HashMap[Shape, (String, String)] = makeInfo()
It says clearly
We define a value here, not a variable or method (val)
The name of the thing we define is shapeInfo
If you care about it, here's the type (HashMap[...])

As well as supporting type inference, this has an ergonomic benefit too.
For any given variable name + type, chances are that the name is the more important piece of information. Moving it to the left makes it more prominent, and the code more readable once you're accustomed to the style.
Other ergonomic benefits:
With val, var or def before member names, instead of their type, they all neatly line up in a column.
If you change just the type of a member, or drop it entirely in favour of inference, then a fine-grained diff tool will clearly show that the name is unaltered
Likewise, changing between a val/var/def is very clear in diffs
inference should be considered default behaviour in Scala, you only need type specifications in certain specific scenarios, even then it's mostly done for the compiler. So putting them at the very start of a declaration emphasises the wrong thing.
"name: Type" instead of "Type name" more closely matches the way most programmers will actually think about a declaration, it's more natural.
The differing C/C++ and Java conventions for pointers and arrays (i.e * being a prefix on the following name and not a suffix on the preceeding type in C/C++, or [] being a valid suffix on both names and types in Java) are still confusing to newcomers or language converts, and cause some very real errors when declaring multiple variables on a single line. Scala leaves no room for doubt and confusion here.

It's afterwards so that it can be removed for type inference:
var stockPrice: Double = 100.0
var stockPrice = 100.0
However, it is not true that imperative languages traditionally have types first. For example, Pascal doesn't.
Now, C does it, and C++, Java and C# are based on C's syntax, so naturally they do it that way too, but that has absolutely nothing to do with imperative languages.

It should be noted that even C doesn't "traditionally" define the type before the variable name, but indeed allows the declarations to be interleaved.
int foo[];
where the type for foo is declared both before and after it, lexically.
Beyond that, I'm guessing this is a distinction without a difference. The compiler developers certainly couldn't care one way or another.

Related

Does Scala have a value restriction like ML, if not then why?

Here’s my thoughts on the question. Can anyone confirm, deny, or elaborate?
I wrote:
Scala doesn’t unify covariant List[A] with a GLB ⊤ assigned to List[Int], bcz afaics in subtyping “biunification” the direction of assignment matters. Thus None must have type Option[⊥] (i.e. Option[Nothing]), ditto Nil type List[Nothing] which can’t accept assignment from an Option[Int] or List[Int] respectively. So the value restriction problem originates from directionless unification and global biunification was thought to be undecidable until the recent research linked above.
You may wish to view the context of the above comment.
ML’s value restriction will disallow parametric polymorphism in (formerly thought to be rare but maybe more prevalent) cases where it would otherwise be sound (i.e. type safe) to do so such as especially for partial application of curried functions (which is important in functional programming), because the alternative typing solutions create a stratification between functional and imperative programming as well as break encapsulation of modular abstract types. Haskell has an analogous dual monomorphisation restriction. OCaml has a relaxation of the restriction in some cases. I elaborated about some of these details.
EDIT: my original intuition as expressed in the above quote (that the value restriction may be obviated by subtyping) is incorrect. The answers IMO elucidate the issue(s) well and I’m unable to decide which in the set containing Alexey’s, Andreas’, or mine, should be the selected best answer. IMO they’re all worthy.
As I explained before, the need for the value restriction -- or something similar -- arises when you combine parametric polymorphism with mutable references (or certain other effects). That is completely independent from whether the language has type inference or not or whether the language also allows subtyping or not. A canonical counter example like
let r : ∀A.Ref(List(A)) = ref [] in
r := ["boo"];
head(!r) + 1
is not affected by the ability to elide the type annotation nor by the ability to add a bound to the quantified type.
Consequently, when you add references to F<: then you need to impose a value restriction to not lose soundness. Similarly, MLsub cannot get rid of the value restriction. Scala enforces a value restriction through its syntax already, since there is no way to even write the definition of a value that would have polymorphic type.
It's much simpler than that. In Scala values can't have polymorphic types, only methods can. E.g. if you write
val id = x => x
its type isn't [A] A => A.
And if you take a polymorphic method e.g.
def id[A](x: A): A = x
and try to assign it to a value
val id1 = id
again the compiler will try (and in this case fail) to infer a specific A instead of creating a polymorphic value.
So the issue doesn't arise.
EDIT:
If you try to reproduce the http://mlton.org/ValueRestriction#_alternatives_to_the_value_restriction example in Scala, the problem you run into isn't the lack of let: val corresponds to it perfectly well. But you'd need something like
val f[A]: A => A = {
var r: Option[A] = None
{ x => ... }
}
which is illegal. If you write def f[A]: A => A = ... it's legal but creates a new r on each call. In ML terms it would be like
val f: unit -> ('a -> 'a) =
fn () =>
let
val r: 'a option ref = ref NONE
in
fn x =>
let
val y = !r
val () = r := SOME x
in
case y of
NONE => x
| SOME y => y
end
end
val _ = f () 13
val _ = f () "foo"
which is allowed by the value restriction.
That is, Scala's rules are equivalent to only allowing lambdas as polymorphic values in ML instead of everything value restriction allows.
EDIT: this answer was incorrect before. I have completely rewritten the explanation below to gather my new understanding from the comments under the answers by Andreas and Alexey.
The edit history and the history of archives of this page at archive.is provides a recording of my prior misunderstanding and discussion. Another reason I chose to edit rather than delete and write a new answer, is to retain the comments on this answer. IMO, this answer is still needed because although Alexey answers the thread title correctly and most succinctly—also Andreas’ elaboration was the most helpful for me to gain understanding—yet I think the layman reader may require a different, more holistic (yet hopefully still generative essence) explanation in order to quickly gain some depth of understanding of the issue. Also I think the other answers obscure how convoluted a holistic explanation is, and I want naive readers to have the option to taste it. The prior elucidations I’ve found don’t state all the details in English language and instead (as mathematicians tend to do for efficiency) rely on the reader to discern the details from the nuances of the symbolic programming language examples and prerequisite domain knowledge (e.g. background facts about programming language design).
The value restriction arises where we have mutation of referenced1 type parametrised objects2. The type unsafety that would result without the value restriction is demonstrated in the following MLton code example:
val r: 'a option ref = ref NONE
val r1: string option ref = r
val r2: int option ref = r
val () = r1 := SOME "foo"
val v: int = valOf (!r2)
The NONE value (which is akin to null) contained in the object referenced by r can be assigned to a reference with any concrete type for the type parameter 'a because r has a polymorphic type a'. That would allow type unsafety because as shown in the example above, the same object referenced by r which has been assigned to both string option ref and int option ref can be written (i.e. mutated) with a string value via the r1 reference and then read as an int value via the r2 reference. The value restriction generates a compiler error for the above example.
A typing complication arises to prevent3 the (re-)quantification (i.e. binding or determination) of the type parameter (aka type variable) of a said reference (and the object it points to) to a type which differs when reusing an instance of said reference that was previously quantified with a different type.
Such (arguably bewildering and convoluted) cases arise for example where successive function applications (aka calls) reuse the same instance of such a reference. IOW, cases where the type parameters (pertaining to the object) for a reference are (re-)quantified each time the function is applied, yet the same instance of the reference (and the object it points to) being reused for each subsequent application (and quantification) of the function.
Tangentially, the occurrence of these is sometimes non-intuitive due to lack of explicit universal quantifier ∀ (since the implicit rank-1 prenex lexical scope quantification can be dislodged from lexical evaluation order by constructions such as let or coroutines) and the arguably greater irregularity (as compared to Scala) of when unsafe cases may arise in ML’s value restriction:
Andreas wrote:
Unfortunately, ML does not usually make the quantifiers explicit in its syntax, only in its typing rules.
Reusing a referenced object is for example desired for let expressions which analogous to math notation, should only create and evaluate the instantiation of the substitutions once even though they may be lexically substituted more than once within the in clause. So for example, if the function application is evaluated as (regardless of whether also lexically or not) within the in clause whilst the type parameters of substitutions are re-quantified for each application (because the instantiation of the substitutions are only lexically within the function application), then type safety can be lost if the applications aren’t all forced to quantify the offending type parameters only once (i.e. disallow the offending type parameter to be polymorphic).
The value restriction is ML’s compromise to prevent all unsafe cases while also preventing some (formerly thought to be rare) safe cases, so as to simplify the type system. The value restriction is considered a better compromise, because the early (antiquated?) experience with more complicated typing approaches that didn’t restrict any or as many safe cases, caused a bifurcation between imperative and pure functional (aka applicative) programming and leaked some of the encapsulation of abstract types in ML functor modules. I cited some sources and elaborated here. Tangentially though, I’m pondering whether the early argument against bifurcation really stands up against the fact that value restriction isn’t required at all for call-by-name (e.g. Haskell-esque lazy evaluation when also memoized by need) because conceptually partial applications don’t form closures on already evaluated state; and call-by-name is required for modular compositional reasoning and when combined with purity then modular (category theory and equational reasoning) control and composition of effects. The monomorphisation restriction argument against call-by-name is really about forcing type annotations, yet being explicit when optimal memoization (aka sharing) is required is arguably less onerous given said annotation is needed for modularity and readability any way. Call-by-value is a fine tooth comb level of control, so where we need that low-level control then perhaps we should accept the value restriction, because the rare cases that more complex typing would allow would be less useful in the imperative versus applicative setting. However, I don’t know if the two can be stratified/segregated in the same programming language in smooth/elegant manner. Algebraic effects can be implemented in a CBV language such as ML and they may obviate the value restriction. IOW, if the value restriction is impinging on your code, possibly it’s because your programming language and libraries lack a suitable metamodel for handling effects.
Scala makes a syntactical restriction against all such references, which is a compromise that restricts for example the same and even more cases (that would be safe if not restricted) than ML’s value restriction, but is more regular in the sense that we’ll not be scratching our head about an error message pertaining to the value restriction. In Scala, we’re never allowed to create such a reference. Thus in Scala, we can only express cases where a new instance of a reference is created when it’s type parameters are quantified. Note OCaml relaxes the value restriction in some cases.
Note afaik both Scala and ML don’t enable declaring that a reference is immutable1, although the object they point to can be declared immutable with val. Note there’s no need for the restriction for references that can’t be mutated.
The reason that mutability of the reference type1 is required in order to make the complicated typing cases arise, is because if we instantiate the reference (e.g. in for example the substitutions clause of let) with a non-parametrised object (i.e. not None or Nil4 but instead for example a Option[String] or List[Int]), then the reference won’t have a polymorphic type (pertaining to the object it points to) and thus the re-quantification issue never arises. So the problematic cases are due to instantiation with a polymorphic object then subsequently assigning a newly quantified object (i.e. mutating the reference type) in a re-quantified context followed by dereferencing (reading) from the (object pointed to by) reference in a subsequent re-quantified context. As aforementioned, when the re-quantified type parameters conflict, the typing complication arises and unsafe cases must be prevented/restricted.
Phew! If you understood that without reviewing linked examples, I’m impressed.
1 IMO to instead employ the phrase “mutable references” instead of “mutability of the referenced object” and “mutability of the reference type” would be more potentially confusing, because our intention is to mutate the object’s value (and its type) which is referenced by the pointer— not referring to mutability of the pointer of what the reference points to. Some programming languages don’t even explicitly distinguish when they’re disallowing in the case of primitive types a choice of mutating the reference or the object they point to.
2 Wherein an object may even be a function, in a programming language that allows first-class functions.
3 To prevent a segmentation fault at runtime due to accessing (read or write of) the referenced object with a presumption about its statically (i.e. at compile-time) determined type which is not the type that the object actually has.
4 Which are NONE and [] respectively in ML.

How do purely functional compilers annotate the AST with type info?

In the syntax analysis phase, an imperative compiler can build an AST out of nodes that already contain a type field that is set to null during construction, and then later, in the semantic analysis phase, fill in the types by assigning the declared/inferred types into the type fields.
How do purely functional languages handle this, where you do not have the luxury of assignment? Is the type-less AST mapped to a different kind of type-enriched AST? Does that mean I need to define two types per AST node, one for the syntax phase, and one for the semantic phase?
Are there purely functional programming tricks that help the compiler writer with this problem?
I usually rewrite a source (or an already several steps lowered) AST into a new form, replacing each expression node with a pair (tag, expression).
Tags are unique numbers or symbols which are then used by the next pass which derives type equations from the AST. E.g., a + b will yield something like { numeric(Tag_a). numeric(Tag_b). equals(Tag_a, Tag_b). equals(Tag_e, Tag_a).}.
Then types equations are solved (e.g., by simply running them as a Prolog program), and, if successful, all the tags (which are variables in this program) are now bound to concrete types, and if not, they're left as type parameters.
In a next step, our previous AST is rewritten again, this time replacing tags with all the inferred type information.
The whole process is a sequence of pure rewrites, no need to replace anything in your AST destructively. A typical compilation pipeline may take a couple of dozens of rewrites, some of them changing the AST datatype.
There are several options to model this. You may use the same kind of nullable data fields as in your imperative case:
data Exp = Var Name (Maybe Type) | ...
parse :: String -> Maybe Exp -- types are Nothings here
typeCheck :: Exp -> Maybe Exp -- turns Nothings into Justs
or even, using a more precise type
data Exp ty = Var Name ty | ...
parse :: String -> Maybe (Exp ())
typeCheck :: Exp () -> Maybe (Exp Type)
I cant speak for how it is supposed to be done, but I did do this in F# for a C# compiler here
The approach was basically - build an AST from the source, leaving things like type information unconstrained - So AST.fs basically is the AST which strings for the type names, function names, etc.
As the AST starts to be compiled to (in this case) .NET IL, we end up with more type information (we create the types in the source - lets call these type-stubs). This then gives us the information needed to created method-stubs (the code may have signatures that include type-stubs as well as built in types). From here we now have enough type information to resolve any of the type names, or method signatures in the code.
I store that in the file TypedAST.fs. I do this in a single pass, however the approach may be naive.
Now we have a fully typed AST you could then do things like compile it, fully analyze it, or whatever you like with it.
So in answer to the question "Does that mean I need to define two types per AST node, one for the syntax phase, and one for the semantic phase?", I cant say definitively that this is the case, but it is certainly what I did, and it appears to be what MS have done with Roslyn (although they have essentially decorated the original tree with type info IIRC)
"Are there purely functional programming tricks that help the compiler writer with this problem?"
Given the ASTs are essentially mirrored in my case, it would be possible to make it generic and transform the tree, but the code may end up (more) horrendous.
i.e.
type 'type AST;
| MethodInvoke of 'type * Name * 'type list
| ....
Like in the case when dealing with relational databases, in functional programming it is often a good idea not to put everything in a single data structure.
In particular, there may not be a data structure that is "the AST".
Most probably, there will be data structures that represent parsed expressions. One possible way to deal with type information is to assign a unique identifier (like an integer) to each node of the tree already during parsing and have some suitable data structure (like a hash map) that associates those node-ids with types. The job of the type inference pass, then, would be just to create this map.

Why didn't scala design around Integer Overflow?

I am a former Java developer and I have recently watched the insightful and entertaining introduction to Scala for Java developers by professor Venkat Subramaniam (https://www.youtube.com/watch?v=LH75sJAR0hc).
A major point introduced is the elimination of declared types in lieu of "type inference". Presumably, this means the higher-order compiler recognizes the type I intend to use, by the context.
Being an application security expert by trade, the first thing I tried to do is break this type inference... Example:
// declare a function that returns the square of an input Int. The return type is to be inferred.
scala> val square = (x:Int) => x*x
square: Int => Int = <function1>
// I can see the compiler inferred an Int for the output value, which I do not agree with.
scala> square(2147483647)
res1: Int = 1
// integer overflow
My question is why did the compiler not see that "*" is an operator with a threat of overflow, and wrap the inputs in something a little more protective like a BigInteger?
According to the professor, I am supposed to forget about the internal implementation and just get on with my business logic. But after my quick demonstration I'm not so sure that Scala is safe for a programmer who doesn't understand what the compiler is doing with my methods.
I think #rightføld somewhat overstates how often overflows do or don't happen (particularly when considering an attacker who is actively trying to overflow you). But I agree with his basic point. Converting all math to BigInteger would almost certainly have created a massive performance impact over Java. For developers to choose such a language, they'd have to get something visible for that cost.
String objects have a much smaller performance overhead over cstrings for many operations. They also provide very visible benefits to the developer, which is why people use them, not security per se. There are many common things that string objects make easy to do over cstrings. BigInteger provides none of that. It requires exactly the same code at a fraction of the speed, but just won't overflow (a bug few developers see day to day, even if security guys see it more often).
The equivalent would have been a cstring (with strcmp, strcpy, strcat, etc.) that ran at a fraction of the speed, but just didn't require a null terminator. I don't think many people would have jumped to use that, either, no matter how much that would help security over null-terminated strings. And if the language required it, I don't see a lot of people anxious to use the language.
And as #rightføld suggests in the comments, interoperability with Java would be trashed, since most if not all numbers would wind up being BigInteger. You'd constantly be converting, which raises the same dangers of overflows while adding a lot of code complexity (and more performance impacts).
A from-scratch language might get away with ubiquitous BigInteger (like python) if the language had a lot of other compelling features, but it's a very hard thing to retrofit into a language that wants to be a natural transition from (and with) Java.
In addition to the above answers, I think this question misunderstands the purpose of type inference in a statically typed language. Type inference does not make the choices that you are referring to - promoting a Int to a BigInt. It is restricted to simply "inferring" the type of an expression based the the known types of subexpressions at compile time.
The * function in Int returns an Int when supplied with an Int input parameter
def *(x: Int): Int
In this case, since x is declared to be an Int, then x*x must be an Int based on the signature of *.
If we really wanted this behavior, we could define a function that promotes Int to BigInt when multiplying.
implicit class SafeInt(x: Int) {
def safeMult(a: Int): scala.math.BigInt = scala.math.BigInt(x)*a
}
Then when we can define a square with the desired property:
scala> val square = (x: Int) => x safeMult x
square: Int => scala.math.BigInt = <function1>
The compiler infers based on the methods available. Int has a method *(Int): Int that is, as far as the compiler knows, perfectly well defined; 2147483647*2147483647 is a perfectly good method call with the result 1, it doesn't throw ClassCastException or anything like that.
Why is the Int type written this way? Largely for Java/JVM compatibility; many parts of Scala have design compromises for the sake of Java compatibility. If you don't need that functionality, you might prefer to use Haskell or a similar language. (I suspect that even without the requirement for JVM compatibility, Scala would have wanted to expose the machine-native integer types so that users could make that performance/correctness tradeoff where desired. They might not have been the default though)
If you're doing numeric computation in Scala you probably want to use the Spire library, which makes it easy to abstract over numeric types, and provides several high-performance numeric types with particular properties. In particular it has a SafeLong type that handles arbitrary-precision integers but with much better performance than BigInt for values which fall within the Long range, similar to Python's integer type.
Because overflow occurs almost never in practice, and BigInteger is slow as a dog compared to Int. It is also most inconvenient to have all * operations on Ints return BigIntegers.
"Recognizes the type I intend to use" is not an accurate description of what scala tries to do. It infers the most generic type possible given the constraints imposed by the context. Hence if you write List(Nil, "1"), you'll get List[Serializable], because Serializable is an interface that List and String share - disregarding that Serializable was probably not on your mind at all.
The question you're asking could be asked more precisely as "why is Int the type of numeric literals instead of BigInteger?" - inference doesn't have much to do with it.
And we can opine all we want on that topic, but there's one most accurate answer describing why Scala is what it is: "because Java".
If you wanted the type of safety that you seem to want, then one approach is to define via a partial function which guards against numeric overflow and then returns either an Option[Int] or even perhaps an Either[Int, BigInteger].
The type inference for your square function is correct - given that it's inferred from the input types you've specified and the type of the * function...it's not really broken in my opinion.

Definition of statically typed and dynamically types

Which of these two definitions is correct?
Statically typed - Type matching is checked at compile time (and therefore can only be applied to compiled languages)
Dynamically typed - Type matching is checked at run time, or not at all. (this term can be applied to compiled or interpreted languages)
Statically typed - Types are assigned to variables, so that I would say 'x is of type int'.
Dynamically typed - types are assigned to values (if at all), so that I would say 'x is holding an int'
By this definition, static or dynamic typing is not tied to compiled or interpreted languages.
Which is correct, or is neither one quite right?
Which is correct, or is neither one quite right?
The first pair of definitions are closer but not quite right.
Statically typed - Type matching is checked at compile time (and therefore can only be applied to compiled languages)
This is tricky. I think if a language were interpreted but did type checking before execution began then it would still be statically typed. The OCaml REPL is almost an example of this except it technically compiles (and type checks) source code into its own byte code and then interprets the byte code.
Dynamically typed - Type matching is checked at run time, or not at all.
Rather:
Dynamically typed - Type checking is done at run time.
Untyped - Type checking is not done.
Statically typed - Types are assigned to variables, so that I would say 'x is of type int'.
Dynamically typed - types are assigned to values (if at all), so that I would say 'x is holding an int'
Variables are irrelevant. Although you only see types explicitly in the source code of many statically typed languages at variable and function definitions all of the subexpressions also have static types. For example, "foo" + 3 is usually a static type error because you cannot add a string to an int but there is no variable involved.
One helpful way to look at the word static is this: static properties are those that hold for all possible executions of the program on all possible inputs. Then you can look at any given language or type system and consider which static properties can it verify, for example:
JavaScript: no segfaults/memory errors
Java/C#/F#: if a program compiled and a variable had a type T, then the variable only holds values of this type - in all executions. But, sadly, reference types also admit null as a value - the billion dollar mistake.
ML has no null, making the above guarantee stronger
Haskell can verify statements about side effects, for example a property such as "this program does not print anything on stdout"
Coq also verifies termination - "this program terminates on all inputs"
How much do you want to verify, this depends on taste and the problem at hand. All magic (verification) comes at price.
If you have never ever seen ML before, do give it a try. At least give 5 minutes of attention to Yaron Minsky's talk. It can change your life as a programmer.
The second is a better definition in my eyes, assuming you're not looking for an explanation as to why or how things work.
Better again would be to say that
Static typing gives variables an EXPLICIT type that CANNOT change
Dynamic typing gives variables an IMPLICIT type that CAN change
I like the latter definition. Consider the type checking when casting from a base class to a derived class in object oriented languages like Java or C++ which fits the second definition and not the first. It's a compiled language with (optional) dynamic type checking.

What is the difference between a strongly typed language and a statically typed language?

Also, does one imply the other?
What is the difference between a strongly typed language and a statically typed language?
A statically typed language has a type system that is checked at compile time by the implementation (a compiler or interpreter). The type check rejects some programs, and programs that pass the check usually come with some guarantees; for example, the compiler guarantees not to use integer arithmetic instructions on floating-point numbers.
There is no real agreement on what "strongly typed" means, although the most widely used definition in the professional literature is that in a "strongly typed" language, it is not possible for the programmer to work around the restrictions imposed by the type system. This term is almost always used to describe statically typed languages.
Static vs dynamic
The opposite of statically typed is "dynamically typed", which means that
Values used at run time are classified into types.
There are restrictions on how such values can be used.
When those restrictions are violated, the violation is reported as a (dynamic) type error.
For example, Lua, a dynamically typed language, has a string type, a number type, and a Boolean type, among others. In Lua every value belongs to exactly one type, but this is not a requirement for all dynamically typed languages. In Lua, it is permissible to concatenate two strings, but it is not permissible to concatenate a string and a Boolean.
Strong vs weak
The opposite of "strongly typed" is "weakly typed", which means you can work around the type system. C is notoriously weakly typed because any pointer type is convertible to any other pointer type simply by casting. Pascal was intended to be strongly typed, but an oversight in the design (untagged variant records) introduced a loophole into the type system, so technically it is weakly typed.
Examples of truly strongly typed languages include CLU, Standard ML, and Haskell. Standard ML has in fact undergone several revisions to remove loopholes in the type system that were discovered after the language was widely deployed.
What's really going on here?
Overall, it turns out to be not that useful to talk about "strong" and "weak". Whether a type system has a loophole is less important than the exact number and nature of the loopholes, how likely they are to come up in practice, and what are the consequences of exploiting a loophole. In practice, it's best to avoid the terms "strong" and "weak" altogether, because
Amateurs often conflate them with "static" and "dynamic".
Apparently "weak typing" is used by some persons to talk about the relative prevalance or absence of implicit conversions.
Professionals can't agree on exactly what the terms mean.
Overall you are unlikely to inform or enlighten your audience.
The sad truth is that when it comes to type systems, "strong" and "weak" don't have a universally agreed on technical meaning. If you want to discuss the relative strength of type systems, it is better to discuss exactly what guarantees are and are not provided.
For example, a good question to ask is this: "is every value of a given type (or class) guaranteed to have been created by calling one of that type's constructors?" In C the answer is no. In CLU, F#, and Haskell it is yes. For C++ I am not sure—I would like to know.
By contrast, static typing means that programs are checked before being executed, and a program might be rejected before it starts. Dynamic typing means that the types of values are checked during execution, and a poorly typed operation might cause the program to halt or otherwise signal an error at run time. A primary reason for static typing is to rule out programs that might have such "dynamic type errors".
Does one imply the other?
On a pedantic level, no, because the word "strong" doesn't really mean anything. But in practice, people almost always do one of two things:
They (incorrectly) use "strong" and "weak" to mean "static" and "dynamic", in which case they (incorrectly) are using "strongly typed" and "statically typed" interchangeably.
They use "strong" and "weak" to compare properties of static type systems. It is very rare to hear someone talk about a "strong" or "weak" dynamic type system. Except for FORTH, which doesn't really have any sort of a type system, I can't think of a dynamically typed language where the type system can be subverted. Sort of by definition, those checks are bulit into the execution engine, and every operation gets checked for sanity before being executed.
Either way, if a person calls a language "strongly typed", that person is very likely to be talking about a statically typed language.
This is often misunderstood so let me clear it up.
Static/Dynamic Typing
Static typing is where the type is bound to the variable. Types are checked at compile time.
Dynamic typing is where the type is bound to the value. Types are checked at run time.
So in Java for example:
String s = "abcd";
s will "forever" be a String. During its life it may point to different Strings (since s is a reference in Java). It may have a null value but it will never refer to an Integer or a List. That's static typing.
In PHP:
$s = "abcd"; // $s is a string
$s = 123; // $s is now an integer
$s = array(1, 2, 3); // $s is now an array
$s = new DOMDocument; // $s is an instance of the DOMDocument class
That's dynamic typing.
Strong/Weak Typing
(Edit alert!)
Strong typing is a phrase with no widely agreed upon meaning. Most programmers who use this term to mean something other than static typing use it to imply that there is a type discipline that is enforced by the compiler. For example, CLU has a strong type system that does not allow client code to create a value of abstract type except by using the constructors provided by the type. C has a somewhat strong type system, but it can be "subverted" to a degree because a program can always cast a value of one pointer type to a value of another pointer type. So for example, in C you can take a value returned by malloc() and cheerfully cast it to FILE*, and the compiler won't try to stop you—or even warn you that you are doing anything dodgy.
(The original answer said something about a value "not changing type at run time". I have known many language designers and compiler writers and have not known one that talked about values changing type at run time, except possibly some very advanced research in type systems, where this is known as the "strong update problem".)
Weak typing implies that the compiler does not enforce a typing discpline, or perhaps that enforcement can easily be subverted.
The original of this answer conflated weak typing with implicit conversion (sometimes also called "implicit promotion"). For example, in Java:
String s = "abc" + 123; // "abc123";
This is code is an example of implicit promotion: 123 is implicitly converted to a string before being concatenated with "abc". It can be argued the Java compiler rewrites that code as:
String s = "abc" + new Integer(123).toString();
Consider a classic PHP "starts with" problem:
if (strpos('abcdef', 'abc') == false) {
// not found
}
The error here is that strpos() returns the index of the match, being 0. 0 is coerced into boolean false and thus the condition is actually true. The solution is to use === instead of == to avoid implicit conversion.
This example illustrates how a combination of implicit conversion and dynamic typing can lead programmers astray.
Compare that to Ruby:
val = "abc" + 123
which is a runtime error because in Ruby the object 123 is not implicitly converted just because it happens to be passed to a + method. In Ruby the programmer must make the conversion explicit:
val = "abc" + 123.to_s
Comparing PHP and Ruby is a good illustration here. Both are dynamically typed languages but PHP has lots of implicit conversions and Ruby (perhaps surprisingly if you're unfamiliar with it) doesn't.
Static/Dynamic vs Strong/Weak
The point here is that the static/dynamic axis is independent of the strong/weak axis. People confuse them probably in part because strong vs weak typing is not only less clearly defined, there is no real consensus on exactly what is meant by strong and weak. For this reason strong/weak typing is far more of a shade of grey rather than black or white.
So to answer your question: another way to look at this that's mostly correct is to say that static typing is compile-time type safety and strong typing is runtime type safety.
The reason for this is that variables in a statically typed language have a type that must be declared and can be checked at compile time. A strongly-typed language has values that have a type at run time, and it's difficult for the programmer to subvert the type system without a dynamic check.
But it's important to understand that a language can be Static/Strong, Static/Weak, Dynamic/Strong or Dynamic/Weak.
Both are poles on two different axis:
strongly typed vs. weakly typed
statically typed vs. dynamically typed
Strongly typed means, a variable will not be automatically converted from one type to another. Weakly typed is the opposite: Perl can use a string like "123" in a numeric context, by automatically converting it into the int 123. A strongly typed language like python will not do this.
Statically typed means, the compiler figures out the type of each variable at compile time. Dynamically typed languages only figure out the types of variables at runtime.
Strongly typed means that there are restrictions between conversions between types.
Statically typed means that the types are not dynamic - you can not change the type of a variable once it has been created.
Answer is already given above. Trying to differentiate between strong vs week and static vs dynamic concept.
What is Strongly typed VS Weakly typed?
Strongly Typed: Will not be automatically converted from one type to another
In Go or Python like strongly typed languages "2" + 8 will raise a type error, because they don't allow for "type coercion".
Weakly (loosely) Typed: Will be automatically converted to one type to another:
Weakly typed languages like JavaScript or Perl won't throw an error and in this case JavaScript will results '28' and perl will result 10.
Perl Example:
my $a = "2" + 8;
print $a,"\n";
Save it to main.pl and run perl main.pl and you will get output 10.
What is Static VS Dynamic type?
In programming, programmer define static typing and dynamic typing with respect to the point at which the variable types are checked. Static typed languages are those in which type checking is done at compile-time, whereas dynamic typed languages are those in which type checking is done at run-time.
Static: Types checked before run-time
Dynamic: Types checked on the fly, during execution
What is this means?
In Go it checks typed before run-time (static check). This mean it not only translates and type-checks code it’s executing, but it will scan through all the code and type error would be thrown before the code is even run. For example,
package main
import "fmt"
func foo(a int) {
if (a > 0) {
fmt.Println("I am feeling lucky (maybe).")
} else {
fmt.Println("2" + 8)
}
}
func main() {
foo(2)
}
Save this file in main.go and run it, you will get compilation failed message for this.
go run main.go
# command-line-arguments
./main.go:9:25: cannot convert "2" (type untyped string) to type int
./main.go:9:25: invalid operation: "2" + 8 (mismatched types string and int)
But this case is not valid for Python. For example following block of code will execute for first foo(2) call and will fail for second foo(0) call. It's because Python is dynamically typed, it only translates and type-checks code it’s executing on. The else block never executes for foo(2), so "2" + 8 is never even looked at and for foo(0) call it will try to execute that block and failed.
def foo(a):
if a > 0:
print 'I am feeling lucky.'
else:
print "2" + 8
foo(2)
foo(0)
You will see following output
python main.py
I am feeling lucky.
Traceback (most recent call last):
File "pyth.py", line 7, in <module>
foo(0)
File "pyth.py", line 5, in foo
print "2" + 8
TypeError: cannot concatenate 'str' and 'int' objects
Data Coercion does not necessarily mean weakly typed because sometimes its syntacical sugar:
The example above of Java being weakly typed because of
String s = "abc" + 123;
Is not weakly typed example because its really doing:
String s = "abc" + new Integer(123).toString()
Data coercion is also not weakly typed if you are constructing a new object.
Java is a very bad example of weakly typed (and any language that has good reflection will most likely not be weakly typed). Because the runtime of the language always knows what the type is (the exception might be native types).
This is unlike C. C is the one of the best examples of weakly typed. The runtime has no idea if 4 bytes is an integer, a struct, a pointer or a 4 characters.
The runtime of the language really defines whether or not its weakly typed otherwise its really just opinion.
EDIT:
After further thought this is not necessarily true as the runtime does not have to have all the types reified in the runtime system to be a Strongly Typed system.
Haskell and ML have such complete static analysis that they can potential ommit type information from the runtime.
One does not imply the other. For a language to be statically typed it means that the types of all variables are known or inferred at compile time.
A strongly typed language does not allow you to use one type as another. C is a weakly typed language and is a good example of what strongly typed languages don't allow. In C you can pass a data element of the wrong type and it will not complain. In strongly typed languages you cannot.
Strong typing probably means that variables have a well-defined type and that there are strict rules about combining variables of different types in expressions. For example, if A is an integer and B is a float, then the strict rule about A+B might be that A is cast to a float and the result returned as a float. If A is an integer and B is a string, then the strict rule might be that A+B is not valid.
Static typing probably means that types are assigned at compile time (or its equivalent for non-compiled languages) and cannot change during program execution.
Note that these classifications are not mutually exclusive, indeed I would expect them to occur together frequently. Many strongly-typed languages are also statically-typed.
And note that when I use the word 'probably' it is because there are no universally accepted definitions of these terms. As you will already have seen from the answers so far.
Imho, it is better to avoid these definitions altogether, not only there is no agreed upon definition of the terms, definitions that do exist tend to focus on technical aspects for example, are operation on mixed type allowed and if not is there a loophole that bypasses the restrictions such as work your way using pointers.
Instead, and emphasizing again that it is an opinion, one should focus on the question: Does the type system make my application more reliable? A question which is application specific.
For example: if my application has a variable named acceleration, then clearly if the way the variable is declared and used allows the assignment of the value "Monday" to acceleration it is a problem, as clearly an acceleration cannot be a weekday (and a string).
Another example: In Ada one can define: subtype Month_Day is Integer range 1..31;, The type Month_Day is weak in the sense that it is not a separate type from Integer (because it is a subtype), however it is restricted to the range 1..31. In contrast: type Month_Day is new Integer; will create a distinct type, which is strong in the sense that that it cannot be mixed with integers without explicit casting - but it is not restricted and can receive the value -17 which is senseless. So technically it is stronger, but is less reliable.
Of course, one can declare type Month_Day is new Integer range 1..31; to create a type which is distinct and restricted.