Why didn't scala design around Integer Overflow? - scala

I am a former Java developer and I have recently watched the insightful and entertaining introduction to Scala for Java developers by professor Venkat Subramaniam (https://www.youtube.com/watch?v=LH75sJAR0hc).
A major point introduced is the elimination of declared types in lieu of "type inference". Presumably, this means the higher-order compiler recognizes the type I intend to use, by the context.
Being an application security expert by trade, the first thing I tried to do is break this type inference... Example:
// declare a function that returns the square of an input Int. The return type is to be inferred.
scala> val square = (x:Int) => x*x
square: Int => Int = <function1>
// I can see the compiler inferred an Int for the output value, which I do not agree with.
scala> square(2147483647)
res1: Int = 1
// integer overflow
My question is why did the compiler not see that "*" is an operator with a threat of overflow, and wrap the inputs in something a little more protective like a BigInteger?
According to the professor, I am supposed to forget about the internal implementation and just get on with my business logic. But after my quick demonstration I'm not so sure that Scala is safe for a programmer who doesn't understand what the compiler is doing with my methods.

I think #rightføld somewhat overstates how often overflows do or don't happen (particularly when considering an attacker who is actively trying to overflow you). But I agree with his basic point. Converting all math to BigInteger would almost certainly have created a massive performance impact over Java. For developers to choose such a language, they'd have to get something visible for that cost.
String objects have a much smaller performance overhead over cstrings for many operations. They also provide very visible benefits to the developer, which is why people use them, not security per se. There are many common things that string objects make easy to do over cstrings. BigInteger provides none of that. It requires exactly the same code at a fraction of the speed, but just won't overflow (a bug few developers see day to day, even if security guys see it more often).
The equivalent would have been a cstring (with strcmp, strcpy, strcat, etc.) that ran at a fraction of the speed, but just didn't require a null terminator. I don't think many people would have jumped to use that, either, no matter how much that would help security over null-terminated strings. And if the language required it, I don't see a lot of people anxious to use the language.
And as #rightføld suggests in the comments, interoperability with Java would be trashed, since most if not all numbers would wind up being BigInteger. You'd constantly be converting, which raises the same dangers of overflows while adding a lot of code complexity (and more performance impacts).
A from-scratch language might get away with ubiquitous BigInteger (like python) if the language had a lot of other compelling features, but it's a very hard thing to retrofit into a language that wants to be a natural transition from (and with) Java.

In addition to the above answers, I think this question misunderstands the purpose of type inference in a statically typed language. Type inference does not make the choices that you are referring to - promoting a Int to a BigInt. It is restricted to simply "inferring" the type of an expression based the the known types of subexpressions at compile time.
The * function in Int returns an Int when supplied with an Int input parameter
def *(x: Int): Int
In this case, since x is declared to be an Int, then x*x must be an Int based on the signature of *.
If we really wanted this behavior, we could define a function that promotes Int to BigInt when multiplying.
implicit class SafeInt(x: Int) {
def safeMult(a: Int): scala.math.BigInt = scala.math.BigInt(x)*a
}
Then when we can define a square with the desired property:
scala> val square = (x: Int) => x safeMult x
square: Int => scala.math.BigInt = <function1>

The compiler infers based on the methods available. Int has a method *(Int): Int that is, as far as the compiler knows, perfectly well defined; 2147483647*2147483647 is a perfectly good method call with the result 1, it doesn't throw ClassCastException or anything like that.
Why is the Int type written this way? Largely for Java/JVM compatibility; many parts of Scala have design compromises for the sake of Java compatibility. If you don't need that functionality, you might prefer to use Haskell or a similar language. (I suspect that even without the requirement for JVM compatibility, Scala would have wanted to expose the machine-native integer types so that users could make that performance/correctness tradeoff where desired. They might not have been the default though)
If you're doing numeric computation in Scala you probably want to use the Spire library, which makes it easy to abstract over numeric types, and provides several high-performance numeric types with particular properties. In particular it has a SafeLong type that handles arbitrary-precision integers but with much better performance than BigInt for values which fall within the Long range, similar to Python's integer type.

Because overflow occurs almost never in practice, and BigInteger is slow as a dog compared to Int. It is also most inconvenient to have all * operations on Ints return BigIntegers.

"Recognizes the type I intend to use" is not an accurate description of what scala tries to do. It infers the most generic type possible given the constraints imposed by the context. Hence if you write List(Nil, "1"), you'll get List[Serializable], because Serializable is an interface that List and String share - disregarding that Serializable was probably not on your mind at all.
The question you're asking could be asked more precisely as "why is Int the type of numeric literals instead of BigInteger?" - inference doesn't have much to do with it.
And we can opine all we want on that topic, but there's one most accurate answer describing why Scala is what it is: "because Java".

If you wanted the type of safety that you seem to want, then one approach is to define via a partial function which guards against numeric overflow and then returns either an Option[Int] or even perhaps an Either[Int, BigInteger].
The type inference for your square function is correct - given that it's inferred from the input types you've specified and the type of the * function...it's not really broken in my opinion.

Related

Do any programming languages provide the ability to name the return value of a function?

Quite commonly while programming I find it necessary to document the value that a function returns. In Java/Scala world, you often use comments above the function to do this.
However, this can stand out in contrast to the first-class documentation that function parameters get in all languages. For example:
def exponent(base: Int, power: Int): Int
Here we have the signature for a method that raises base to the power power and returns... probably the result of that computation? I know for certain it returns an Int, and it seems quite reasonable to infer that the return value is indeed the result of calculating base ^ power, but in many functions I've written and read it is not possible to infer the return value's semantic meaning quite so easily and you need to study the documentation and/or actually use the method to find out.
Which leads me to wonder, do any languages provide support for optionally declaring a semantic name for the return value?
def exponent(base: Int, power: Int): Int(exitCode)
A hah! Turns out this function actually returns an indication of whether the operation succeeded or failed! Look it is so clear right there in the method signature! My IDE could also intelligently create a variable with the same name when I call this method, a la:
// Typing in IntelliJ
exponent(5, 5)<TAB>
// Autocompletes to:
val exitCode = exponent(5, 5)
I'm not under any illusion that this is some sort of ground-breaking idea, but it seems like it could be generally useful, and I'm struck that I have never seen this concept implemented in any programming language.
Can you name any single programming language that does have this kind of semantic naming of return values?
In APL, for instance, the result of a function is declared as a variable. The function declaration in your example could be written like
exitCode ← base exponent power
in APL. However, a function with no side effects should always be named after the result it returns. If the function can fail I would use a value that is never returned on success, for instance -1 in this case.

When to use implicit parameters

I've been using Scala at work, and I have a question related to implicit parameters.
Often I've seen executionContext defined in method definitions and also in class definitions.
At the same time I've seen classes that accepts case classes that contain configuration data (timeout, adapter, port, etc.) as regular parameters.
My question is why when passing configuration this parameter is not defined as implicit?
Or the other way around what if executionContext would be defined as a regular parameter?
I'm trying to understand when to use implicit parameter and when not to use them.
EDIT: maybe the example of passing a case class is not the best example, it was the first idea that comes to my mind
Conceptually, implicits are something "external" to the application logic, and explicit parameters are ... well ... explicit.
Consider a function def f(x: Double): Double = x*x
It is a pure function that transforms a given real number into another real number. It makes sense for x to be an explicit parameter, as it is an intrinsic part of what this function is.
Now, suppose, you were implementing some sort of approximate algorithm for multiplication, and wanted to control the precision with which you function computes the answer.
You could do def f(x: Double, precision: Int): Double = ???. It would work, but is inconvenient and kinda clumsy:
Function definition no longer expresses the conceptual "nature" of the function being a pure transformation on the set of real numbers
It makes it complicated at the call site, because everyone using your function must now be aware of this additional parameter to pass around (imagine, you are writing a library for non–engineer math majors to use, they understand abstract transformations and complex formulas, but could care less about numeric precision: how often do you think about precision when you need to compute an area of a square?).
It also makes existing code harder to read and modify
So, to make it prettier, you can do def f(x: Double)(implicit precision: Int) = ???. This has an advantage of saying exactly what you want: "I have a transformation double => double, that will use the implied precision when the actual result is computed). Those math majors can now write their abstract formulas the way they are used to: val area = square(x) without polluting their logic with annoying configurations they don't really care about.
When to use this exactly is, certainly, a question of opinion and taste (which is expressly forbidden on SO). Someone can certainly argue about the above example, that precision is actually a part of the transformation definition, because 5.429 and 5.54289 (results of f(2.33)(3) and f(2.33)(4) respectively) are two different numbers.
So, in the end of the day, you just gotta use your judgement and your common sense to make a decision for every case you come across.
When using existing libraries, there is another consideration. Consider:
def foo(f: Future[Int], ec: ExecutionContext) =
f.map { x => x*x }(ec)
.map { _.toString } (ec)
.foreach(println)(ec)
This would look a lot nicer and less messy if you made ec implicit, regardless of where you stand philosophically on whether to consider it a part of your transformation or not:
def foo(f: Future[Int])(implicit ec: ExecutionContext) =
f.map { x => x*x }.map(_.toString).foreach(println)
Implicits can be used when:
you need only one value of some type
it is unambiguous how such value would be defined
this includes both manual definition as well as using metaprogramming to generate the value based on e.g. how its type is defined
Futures and Akka decided that passing some "globals" as implicits is a reasonable use case, so they would pass as implicits:
ExecutionContext
ActorSystem, Materializer
various configs like Timeout
in general things which you don't want to be put into some static field, but which are passed around everywhere.
However, the rest of Scala world would solve this issue by using some abstraction that would pass these things under the hood, some sort of builders, via constructors, abstractions over (dependencies) => result functions, etc.
E.g. cats.effect.IO don't need to pass ExecutionContext around because it passes its scheduler around when you run it. Only when you want to explicitly change the pool things are being run on you have to use some method. In Monix running things also require you to pass Scheduler at the end, when whole computation is composed. So both approached let you give up on passing around all these ExecutionContexts. In case of Future it is necessary because you need to have control over thread pools, but you also evaluate things eagerly, and putting ec (futureA.flatMap(f)(ec)) manually would break for-comprehension.
As a result, outside Akka ecosystem and raw Futures, are more often used to carry around type-classes, as a mean to decouple business logic from particular implementation, allow adding support for new types without modifying code that uses these implementations, and so on. (There are tons of examples of type-classes in Scala so I'll skip it here).
Usually, when I read about people using implicits to pass configs around, it is just a matter of time before it ends up with grief. Akka and EC kind of requires them but you should just pass configs explicitly. You can group them into case classes to pass bunch of them around and it is not that much of an issue. You can also put all things required as implicits explicitly into one place and do:
case class Configs(dbEX: EC, mapEC: EC)
class SomeBehavior(configs: Configs) {
def someAction = {
if (...) {
implicit val ec: EC = configs.dbEC
...
} else {
implicit val ec: EC = configs.mapEC
...
}
}
}
to make them implicit only in the place that needs them. A good role of thumb is: do you care if there is something passed around that you don't see right in the code? Usually, the answer is, yes you do, you would prefer to see it, with only exceptions being cases when it would be somewhat obvious where does the value come from, or if you kinda knew that the value would be ok and you didn't bother where it came from.
There are a multitude of use-cases of implicit in Scala: under the hood, they boil down to leveraging the compiler's implicit resolution mechanism to fill in things that might not have explicitly been mentioned, but the use-cases are divergent enough that in Scala 3, each use-case (of those that survive into Scala 3...) gets encoded with a different keyword.
In the case of the execution context, implicit arguments are being used to mimic dynamic scope in a language which is normally statically scoped. The primary win from doing this is that it allows behavior further down the call stack to be decided-upon much further up the call stack without having to always explicitly pass on the behavior through the intervening layers of the stack (while providing a way for those intervening layers to cleanly force a different behavior).
Historically, a major example of this was for things like numeric precision. Many numeric operations end up being implemented through iterated refinement (e.g. when square-root was implemented in software, it might be implemented using Newton's method), which means there's a trade-off between speed of calculation and precision (suggesting accuracy). With dynamic scoping, there's a neat way to accomplish this: a global variable for the desired level of precision in mathematical results. Your numeric routine checks the value of that variable and governs itself accordingly. The difference from globals in a statically-scoped language is that when A calls B which calls C, if A sets the value of x to 1 and B sets it to 2, x will be 2 when checked in C or B, but once B returns to A, x will once again be 1 (in dynamically scoped languages, you can think of a global variable as really being a name for a stack of values, and the language implementation automatically pops the stack as appropriate).
Dynamic scoping was once fairly popular (especially so in Lisps before the mid/late 1970s); nowadays the only places you really see it are in Bourne shells (including bash), Emacs Lisp; while some languages (Perl and Common Lisp are probably the two main examples) are hybrids: a variable gets declared in a special way to make it dynamically or statically scoped. Static scoping has pretty clearly won: it's easier for the language implementation or the programmer to reason about.
The cost of that ease is that, in our numeric computation example, we end up with something like the following:
def newtonSqrt(x: Double, precision: Int): Double = ???
/** Calculates the length of the hypotenuse of a right triangle with legs of given lengths
*/
def hypotenuse(x: Double, y: Double, precision: Int): Double =
newtonSqrt(x*x + y*y, precision)
Thankfully, Scala supports default arguments, so we avoid having versions that use a default precision, too. Arguably, the precision is exposing an implementation detail (the fact that our calculations aren't necessarily perfectly mathematically accurate): the important thing is that the length of the hypotenuse is the square root of the sum of the squares of the legs.
In Scala, we can make the precision implicit:
// DON'T ACTUALLY PASS AN INT IMPLICITLY!!!!!!
def newtonSqrt(x: Double)(implicit precision: Int): Double = ???
def hypotenuse(x: Double, y: Double)(implicit precision: Int): Double =
newtonSqrt(x*x, y*y)
(It's actually really bad to ever pass a primitive or any type which could plausibly be used for something other than describing the behavior in question through the implicit mechanism: I'm doing it here for didactic clarity).
The compiler will effectively translate newtonSqrt(x*x + y*y) to (something very similar to) newtonSqrt(x*x + y*y, precision). Now callers to hypotenuse can decide to fix precision via an implicit val or to defer the choice to their callers by adding the implicit to their signature.
Dynamic scoping has long been controversial, so it's no surprise that even the constrained dynamic scoping this usage of implicit embeds is controversial. In Scala's case, it doesn't help that in many cases the tooling throws up its hands when it comes to helping you figure out implicits: most of the really furious compiler errors one encounters are related to missing implicits or collisions, and tracing to figure out which values are in the implicit scope at any time is not something the tooling has a history of helping people with. Thus there are many developers who have decided that explicitly threading through configuration is superior to using implicits.
It's largely a matter of taste and the situation whether this sort of behavior description is best passed implicitly or explicitly (and it's worth noting that the type-class pattern, especially without a hard requirement for coherence (that there be one and only one possible way to describe the behavior) as is typical in Scala, is just a special case of this behavior description).
I should also note that it isn't a binary choice between bundling a few settings into a case class vs. passing them implicitly: you can do both:
case class ProcessSettings(sys: ActorSystem, ec: ExecutionContext)
object ProcessSettings {
implicit def implicitly(implicit sys: ActorSystem, ec: ExecutionContext): ProcessSettings =
ProcessSettings(sys, ec)
}
def doStuff(x: SomeInput)(implicit settings: ProcessSettings)

Does Scala have a value restriction like ML, if not then why?

Here’s my thoughts on the question. Can anyone confirm, deny, or elaborate?
I wrote:
Scala doesn’t unify covariant List[A] with a GLB ⊤ assigned to List[Int], bcz afaics in subtyping “biunification” the direction of assignment matters. Thus None must have type Option[⊥] (i.e. Option[Nothing]), ditto Nil type List[Nothing] which can’t accept assignment from an Option[Int] or List[Int] respectively. So the value restriction problem originates from directionless unification and global biunification was thought to be undecidable until the recent research linked above.
You may wish to view the context of the above comment.
ML’s value restriction will disallow parametric polymorphism in (formerly thought to be rare but maybe more prevalent) cases where it would otherwise be sound (i.e. type safe) to do so such as especially for partial application of curried functions (which is important in functional programming), because the alternative typing solutions create a stratification between functional and imperative programming as well as break encapsulation of modular abstract types. Haskell has an analogous dual monomorphisation restriction. OCaml has a relaxation of the restriction in some cases. I elaborated about some of these details.
EDIT: my original intuition as expressed in the above quote (that the value restriction may be obviated by subtyping) is incorrect. The answers IMO elucidate the issue(s) well and I’m unable to decide which in the set containing Alexey’s, Andreas’, or mine, should be the selected best answer. IMO they’re all worthy.
As I explained before, the need for the value restriction -- or something similar -- arises when you combine parametric polymorphism with mutable references (or certain other effects). That is completely independent from whether the language has type inference or not or whether the language also allows subtyping or not. A canonical counter example like
let r : ∀A.Ref(List(A)) = ref [] in
r := ["boo"];
head(!r) + 1
is not affected by the ability to elide the type annotation nor by the ability to add a bound to the quantified type.
Consequently, when you add references to F<: then you need to impose a value restriction to not lose soundness. Similarly, MLsub cannot get rid of the value restriction. Scala enforces a value restriction through its syntax already, since there is no way to even write the definition of a value that would have polymorphic type.
It's much simpler than that. In Scala values can't have polymorphic types, only methods can. E.g. if you write
val id = x => x
its type isn't [A] A => A.
And if you take a polymorphic method e.g.
def id[A](x: A): A = x
and try to assign it to a value
val id1 = id
again the compiler will try (and in this case fail) to infer a specific A instead of creating a polymorphic value.
So the issue doesn't arise.
EDIT:
If you try to reproduce the http://mlton.org/ValueRestriction#_alternatives_to_the_value_restriction example in Scala, the problem you run into isn't the lack of let: val corresponds to it perfectly well. But you'd need something like
val f[A]: A => A = {
var r: Option[A] = None
{ x => ... }
}
which is illegal. If you write def f[A]: A => A = ... it's legal but creates a new r on each call. In ML terms it would be like
val f: unit -> ('a -> 'a) =
fn () =>
let
val r: 'a option ref = ref NONE
in
fn x =>
let
val y = !r
val () = r := SOME x
in
case y of
NONE => x
| SOME y => y
end
end
val _ = f () 13
val _ = f () "foo"
which is allowed by the value restriction.
That is, Scala's rules are equivalent to only allowing lambdas as polymorphic values in ML instead of everything value restriction allows.
EDIT: this answer was incorrect before. I have completely rewritten the explanation below to gather my new understanding from the comments under the answers by Andreas and Alexey.
The edit history and the history of archives of this page at archive.is provides a recording of my prior misunderstanding and discussion. Another reason I chose to edit rather than delete and write a new answer, is to retain the comments on this answer. IMO, this answer is still needed because although Alexey answers the thread title correctly and most succinctly—also Andreas’ elaboration was the most helpful for me to gain understanding—yet I think the layman reader may require a different, more holistic (yet hopefully still generative essence) explanation in order to quickly gain some depth of understanding of the issue. Also I think the other answers obscure how convoluted a holistic explanation is, and I want naive readers to have the option to taste it. The prior elucidations I’ve found don’t state all the details in English language and instead (as mathematicians tend to do for efficiency) rely on the reader to discern the details from the nuances of the symbolic programming language examples and prerequisite domain knowledge (e.g. background facts about programming language design).
The value restriction arises where we have mutation of referenced1 type parametrised objects2. The type unsafety that would result without the value restriction is demonstrated in the following MLton code example:
val r: 'a option ref = ref NONE
val r1: string option ref = r
val r2: int option ref = r
val () = r1 := SOME "foo"
val v: int = valOf (!r2)
The NONE value (which is akin to null) contained in the object referenced by r can be assigned to a reference with any concrete type for the type parameter 'a because r has a polymorphic type a'. That would allow type unsafety because as shown in the example above, the same object referenced by r which has been assigned to both string option ref and int option ref can be written (i.e. mutated) with a string value via the r1 reference and then read as an int value via the r2 reference. The value restriction generates a compiler error for the above example.
A typing complication arises to prevent3 the (re-)quantification (i.e. binding or determination) of the type parameter (aka type variable) of a said reference (and the object it points to) to a type which differs when reusing an instance of said reference that was previously quantified with a different type.
Such (arguably bewildering and convoluted) cases arise for example where successive function applications (aka calls) reuse the same instance of such a reference. IOW, cases where the type parameters (pertaining to the object) for a reference are (re-)quantified each time the function is applied, yet the same instance of the reference (and the object it points to) being reused for each subsequent application (and quantification) of the function.
Tangentially, the occurrence of these is sometimes non-intuitive due to lack of explicit universal quantifier ∀ (since the implicit rank-1 prenex lexical scope quantification can be dislodged from lexical evaluation order by constructions such as let or coroutines) and the arguably greater irregularity (as compared to Scala) of when unsafe cases may arise in ML’s value restriction:
Andreas wrote:
Unfortunately, ML does not usually make the quantifiers explicit in its syntax, only in its typing rules.
Reusing a referenced object is for example desired for let expressions which analogous to math notation, should only create and evaluate the instantiation of the substitutions once even though they may be lexically substituted more than once within the in clause. So for example, if the function application is evaluated as (regardless of whether also lexically or not) within the in clause whilst the type parameters of substitutions are re-quantified for each application (because the instantiation of the substitutions are only lexically within the function application), then type safety can be lost if the applications aren’t all forced to quantify the offending type parameters only once (i.e. disallow the offending type parameter to be polymorphic).
The value restriction is ML’s compromise to prevent all unsafe cases while also preventing some (formerly thought to be rare) safe cases, so as to simplify the type system. The value restriction is considered a better compromise, because the early (antiquated?) experience with more complicated typing approaches that didn’t restrict any or as many safe cases, caused a bifurcation between imperative and pure functional (aka applicative) programming and leaked some of the encapsulation of abstract types in ML functor modules. I cited some sources and elaborated here. Tangentially though, I’m pondering whether the early argument against bifurcation really stands up against the fact that value restriction isn’t required at all for call-by-name (e.g. Haskell-esque lazy evaluation when also memoized by need) because conceptually partial applications don’t form closures on already evaluated state; and call-by-name is required for modular compositional reasoning and when combined with purity then modular (category theory and equational reasoning) control and composition of effects. The monomorphisation restriction argument against call-by-name is really about forcing type annotations, yet being explicit when optimal memoization (aka sharing) is required is arguably less onerous given said annotation is needed for modularity and readability any way. Call-by-value is a fine tooth comb level of control, so where we need that low-level control then perhaps we should accept the value restriction, because the rare cases that more complex typing would allow would be less useful in the imperative versus applicative setting. However, I don’t know if the two can be stratified/segregated in the same programming language in smooth/elegant manner. Algebraic effects can be implemented in a CBV language such as ML and they may obviate the value restriction. IOW, if the value restriction is impinging on your code, possibly it’s because your programming language and libraries lack a suitable metamodel for handling effects.
Scala makes a syntactical restriction against all such references, which is a compromise that restricts for example the same and even more cases (that would be safe if not restricted) than ML’s value restriction, but is more regular in the sense that we’ll not be scratching our head about an error message pertaining to the value restriction. In Scala, we’re never allowed to create such a reference. Thus in Scala, we can only express cases where a new instance of a reference is created when it’s type parameters are quantified. Note OCaml relaxes the value restriction in some cases.
Note afaik both Scala and ML don’t enable declaring that a reference is immutable1, although the object they point to can be declared immutable with val. Note there’s no need for the restriction for references that can’t be mutated.
The reason that mutability of the reference type1 is required in order to make the complicated typing cases arise, is because if we instantiate the reference (e.g. in for example the substitutions clause of let) with a non-parametrised object (i.e. not None or Nil4 but instead for example a Option[String] or List[Int]), then the reference won’t have a polymorphic type (pertaining to the object it points to) and thus the re-quantification issue never arises. So the problematic cases are due to instantiation with a polymorphic object then subsequently assigning a newly quantified object (i.e. mutating the reference type) in a re-quantified context followed by dereferencing (reading) from the (object pointed to by) reference in a subsequent re-quantified context. As aforementioned, when the re-quantified type parameters conflict, the typing complication arises and unsafe cases must be prevented/restricted.
Phew! If you understood that without reviewing linked examples, I’m impressed.
1 IMO to instead employ the phrase “mutable references” instead of “mutability of the referenced object” and “mutability of the reference type” would be more potentially confusing, because our intention is to mutate the object’s value (and its type) which is referenced by the pointer— not referring to mutability of the pointer of what the reference points to. Some programming languages don’t even explicitly distinguish when they’re disallowing in the case of primitive types a choice of mutating the reference or the object they point to.
2 Wherein an object may even be a function, in a programming language that allows first-class functions.
3 To prevent a segmentation fault at runtime due to accessing (read or write of) the referenced object with a presumption about its statically (i.e. at compile-time) determined type which is not the type that the object actually has.
4 Which are NONE and [] respectively in ML.

Why I need a new primitive?

I am learning functional programming in Scala and reading the book FPiS.
One the page 131 about API design(algebra), the author mentioned following:
As we discussed in chapter 7, we’re interested in understanding what
operations are primitive and what operations are derived, and in
finding a small yet expressive set of primitives. A good way to
explore what is expressible with a given set of primitives is to pick
some concrete examples you’d like to express, and see if you can
assemble the functionality you want. As you do so, look for patterns,
try factoring out these patterns into combinators, and refine your set
of primitives. We encourage you to stop reading here and simply play
with the primitives and combinators we’ve written so far. If you want
some concrete examples to inspire you, here are some ideas:
What does he mean with expressive primitives? What is a primitive?
(The chapter 7 does not explain about primitive at all.)
I am imaging, a primitive is a smallest thing of something that can be build with combinator to get a higher level thing.
The question from the book:
If we can generate a single Int in some range, do we need a new primitive to
generate an (Int,Int) pair in some range?
What is the answer of the question?
Is Int a primitive?
What is a primitive function?
What is a primitive?
I am imaging, a primitive is a smallest thing of something that can be build with combinator to get a higher level thing.
Right. So in this case: you are developing an API which has type Gen[A] denoting "a generator of values of type A". Combinators in this API will be methods which operate on Gens and produce Gens. However, you also need something to apply those combinators to: methods which produce Gens without starting with one.
One of these may be range(min: Int, max: Int): Gen[Int]. If you now need a pairInRange(min: Int, max: Int): Gen[(Int, Int)], do you need to implement it in the same way range is implemented, or can you build it from range using combinators? And if you can, what combinators do you need? In this case, you should conclude that
def pairInRange(min: Int, max: Int) = pair(range(min, max), range(min, max))
is a reasonable implementation, assuming there is a pair combinator (think what its type should be), so pairInRange doesn't need to be a primitive: it's a derived operation. Primitive operations are those which aren't derived.
You could equally have pairInRange as primitive and def pair(min: Int, max: Int) = pairInRange(min, max).map(_._1) as derived; this is just an unnatural way to design it!
(The chapter 7 does not explain about primitive at all.)
Now that I've got home and have access to the book: yes, it does. If you search for "primitive" in the book (assuming you have the electronic version), you'll find this quote in 7.1.3 Explicit forking:
The function lazyUnit is a simple example of a derived combinator, as opposed to a
primitive combinator like unit. We were able to define lazyUnit just in terms of other
operations. Later, when we pick a representation for Par, lazyUnit won’t need to
know anything about this representation—its only knowledge of Par is through the
operations fork and unit that are defined on Par.
Note in particular the last sentence, which makes a point I didn't emphasize above. You should also read what follows: there are more explanations in 7.3.
What does he mean with expressive primitives?
A primitive is not expressive, it's a set of primitives (including combinators) that is expressive. That is, it should allow you to express not just range and pairInRange but all other generators you want.
Is Int a primitive?
No. This is a meaning of "primitive" unrelated to "primitive types" (Int, Double, etc.)

Is it possible to implement F#'s infrastructure for Units of Measurement in Scala?

F# ships with special support for a unit of measurement system, which provides static type safety while compiling down to the numeric types instead of burdening the runtime with wrapping/unwrapping operations.
Is it possible to use some of Scala's type system magic to implement something comparable to that?
The answer is no.
Now, someone is bound to point me to Scalar, but that gives runtime checking. Perhaps, then, point to the efforts of Jesper Nordenberg's type-safe units or Jim McBeath's take on it, but these are cumbersome and awkward.
I'll point, instead to the Units compiler plugin. It gave Scala, back in 2008/2009, a pretty good system of units, as can be seen in this post. It did so, however, by extending the compiler, which would not be necessary if the type system was enough. Alas, it has not been maintained and it doesn't work anymore.
I don't know anything about it, but I just stumbled accross this talk at Scala Days: https://wiki.scala-lang.org/display/SW/ScalaDays+2011+Resources#ScalaDays2011Resources-ScalaUImplementingaScalalibraryforUnitsofMeasure
Kind of. You can encode the SI units quite easily using a type representation of integers in a tuple of exponents. See http://svn.assembla.com/svn/metascala/src/metascala/Units.scala for an example implementation.
It should also be possible to support an extensible units system if the units are encoded as a TList of pairs of a unit type and an integer (for example, ((M, _1), (S, _2)) where M <: Unit and S <: Unit). Calculating the types for quantity operations becomes a bit more complicated in this encoding.
Regarding performance there will always be a memory overhead for wrapping the value in a type containing the unit information. However there is probably no performance overhead in the actual operations as all unit checking is done at compile time.
Have a look at Units of Measure - A Scala Macro System. It seems to satisfy your requirements.