Does My imposed Condition have a name in category theory/Set theory? - scala

There's a requirement i'm placing on the signature of a total & referencially transparent function:
def add[T](a: T)(b: T): T
//requirement is type T under e.g. addition must always bear antoher type T
// and is not allowed to throw runtime arithmetic exceptions or such.
this requirement can be easily fulfilled for many types such as Int,String,Nat(natural numbers); yet is also easily violated by types such as NonZeroInt as addition of two non-zero integers can in fact be zero.
My question is there a coined term for this condition? Monoid comes to mind but it's obvious I'm not imposing all the rules for monoids here.

If I understand what you're asking, then the term you are looking for is "closure" for a set given an operation. Refer to the mathematical definition in Wikipedia here. In short:
A set has closure under an operation if performance of that operation
on members of the set always produces a member of the same set
However, "closure" seems to have a different meaning in computer science. See the link here. And my searches pertaining to closure in the context of Scala, even when also putting it in the context of mathematics or set-theory, doesn't lead to any helpful results. Perhaps this is why you've had trouble finding the coined term.

Without any laws required it is just anot operation on a set T, nothing more. If add is associative you can call it Semigroup.

Related

Does Scala have a value restriction like ML, if not then why?

Here’s my thoughts on the question. Can anyone confirm, deny, or elaborate?
I wrote:
Scala doesn’t unify covariant List[A] with a GLB ⊤ assigned to List[Int], bcz afaics in subtyping “biunification” the direction of assignment matters. Thus None must have type Option[⊥] (i.e. Option[Nothing]), ditto Nil type List[Nothing] which can’t accept assignment from an Option[Int] or List[Int] respectively. So the value restriction problem originates from directionless unification and global biunification was thought to be undecidable until the recent research linked above.
You may wish to view the context of the above comment.
ML’s value restriction will disallow parametric polymorphism in (formerly thought to be rare but maybe more prevalent) cases where it would otherwise be sound (i.e. type safe) to do so such as especially for partial application of curried functions (which is important in functional programming), because the alternative typing solutions create a stratification between functional and imperative programming as well as break encapsulation of modular abstract types. Haskell has an analogous dual monomorphisation restriction. OCaml has a relaxation of the restriction in some cases. I elaborated about some of these details.
EDIT: my original intuition as expressed in the above quote (that the value restriction may be obviated by subtyping) is incorrect. The answers IMO elucidate the issue(s) well and I’m unable to decide which in the set containing Alexey’s, Andreas’, or mine, should be the selected best answer. IMO they’re all worthy.
As I explained before, the need for the value restriction -- or something similar -- arises when you combine parametric polymorphism with mutable references (or certain other effects). That is completely independent from whether the language has type inference or not or whether the language also allows subtyping or not. A canonical counter example like
let r : ∀A.Ref(List(A)) = ref [] in
r := ["boo"];
head(!r) + 1
is not affected by the ability to elide the type annotation nor by the ability to add a bound to the quantified type.
Consequently, when you add references to F<: then you need to impose a value restriction to not lose soundness. Similarly, MLsub cannot get rid of the value restriction. Scala enforces a value restriction through its syntax already, since there is no way to even write the definition of a value that would have polymorphic type.
It's much simpler than that. In Scala values can't have polymorphic types, only methods can. E.g. if you write
val id = x => x
its type isn't [A] A => A.
And if you take a polymorphic method e.g.
def id[A](x: A): A = x
and try to assign it to a value
val id1 = id
again the compiler will try (and in this case fail) to infer a specific A instead of creating a polymorphic value.
So the issue doesn't arise.
EDIT:
If you try to reproduce the http://mlton.org/ValueRestriction#_alternatives_to_the_value_restriction example in Scala, the problem you run into isn't the lack of let: val corresponds to it perfectly well. But you'd need something like
val f[A]: A => A = {
var r: Option[A] = None
{ x => ... }
}
which is illegal. If you write def f[A]: A => A = ... it's legal but creates a new r on each call. In ML terms it would be like
val f: unit -> ('a -> 'a) =
fn () =>
let
val r: 'a option ref = ref NONE
in
fn x =>
let
val y = !r
val () = r := SOME x
in
case y of
NONE => x
| SOME y => y
end
end
val _ = f () 13
val _ = f () "foo"
which is allowed by the value restriction.
That is, Scala's rules are equivalent to only allowing lambdas as polymorphic values in ML instead of everything value restriction allows.
EDIT: this answer was incorrect before. I have completely rewritten the explanation below to gather my new understanding from the comments under the answers by Andreas and Alexey.
The edit history and the history of archives of this page at archive.is provides a recording of my prior misunderstanding and discussion. Another reason I chose to edit rather than delete and write a new answer, is to retain the comments on this answer. IMO, this answer is still needed because although Alexey answers the thread title correctly and most succinctly—also Andreas’ elaboration was the most helpful for me to gain understanding—yet I think the layman reader may require a different, more holistic (yet hopefully still generative essence) explanation in order to quickly gain some depth of understanding of the issue. Also I think the other answers obscure how convoluted a holistic explanation is, and I want naive readers to have the option to taste it. The prior elucidations I’ve found don’t state all the details in English language and instead (as mathematicians tend to do for efficiency) rely on the reader to discern the details from the nuances of the symbolic programming language examples and prerequisite domain knowledge (e.g. background facts about programming language design).
The value restriction arises where we have mutation of referenced1 type parametrised objects2. The type unsafety that would result without the value restriction is demonstrated in the following MLton code example:
val r: 'a option ref = ref NONE
val r1: string option ref = r
val r2: int option ref = r
val () = r1 := SOME "foo"
val v: int = valOf (!r2)
The NONE value (which is akin to null) contained in the object referenced by r can be assigned to a reference with any concrete type for the type parameter 'a because r has a polymorphic type a'. That would allow type unsafety because as shown in the example above, the same object referenced by r which has been assigned to both string option ref and int option ref can be written (i.e. mutated) with a string value via the r1 reference and then read as an int value via the r2 reference. The value restriction generates a compiler error for the above example.
A typing complication arises to prevent3 the (re-)quantification (i.e. binding or determination) of the type parameter (aka type variable) of a said reference (and the object it points to) to a type which differs when reusing an instance of said reference that was previously quantified with a different type.
Such (arguably bewildering and convoluted) cases arise for example where successive function applications (aka calls) reuse the same instance of such a reference. IOW, cases where the type parameters (pertaining to the object) for a reference are (re-)quantified each time the function is applied, yet the same instance of the reference (and the object it points to) being reused for each subsequent application (and quantification) of the function.
Tangentially, the occurrence of these is sometimes non-intuitive due to lack of explicit universal quantifier ∀ (since the implicit rank-1 prenex lexical scope quantification can be dislodged from lexical evaluation order by constructions such as let or coroutines) and the arguably greater irregularity (as compared to Scala) of when unsafe cases may arise in ML’s value restriction:
Andreas wrote:
Unfortunately, ML does not usually make the quantifiers explicit in its syntax, only in its typing rules.
Reusing a referenced object is for example desired for let expressions which analogous to math notation, should only create and evaluate the instantiation of the substitutions once even though they may be lexically substituted more than once within the in clause. So for example, if the function application is evaluated as (regardless of whether also lexically or not) within the in clause whilst the type parameters of substitutions are re-quantified for each application (because the instantiation of the substitutions are only lexically within the function application), then type safety can be lost if the applications aren’t all forced to quantify the offending type parameters only once (i.e. disallow the offending type parameter to be polymorphic).
The value restriction is ML’s compromise to prevent all unsafe cases while also preventing some (formerly thought to be rare) safe cases, so as to simplify the type system. The value restriction is considered a better compromise, because the early (antiquated?) experience with more complicated typing approaches that didn’t restrict any or as many safe cases, caused a bifurcation between imperative and pure functional (aka applicative) programming and leaked some of the encapsulation of abstract types in ML functor modules. I cited some sources and elaborated here. Tangentially though, I’m pondering whether the early argument against bifurcation really stands up against the fact that value restriction isn’t required at all for call-by-name (e.g. Haskell-esque lazy evaluation when also memoized by need) because conceptually partial applications don’t form closures on already evaluated state; and call-by-name is required for modular compositional reasoning and when combined with purity then modular (category theory and equational reasoning) control and composition of effects. The monomorphisation restriction argument against call-by-name is really about forcing type annotations, yet being explicit when optimal memoization (aka sharing) is required is arguably less onerous given said annotation is needed for modularity and readability any way. Call-by-value is a fine tooth comb level of control, so where we need that low-level control then perhaps we should accept the value restriction, because the rare cases that more complex typing would allow would be less useful in the imperative versus applicative setting. However, I don’t know if the two can be stratified/segregated in the same programming language in smooth/elegant manner. Algebraic effects can be implemented in a CBV language such as ML and they may obviate the value restriction. IOW, if the value restriction is impinging on your code, possibly it’s because your programming language and libraries lack a suitable metamodel for handling effects.
Scala makes a syntactical restriction against all such references, which is a compromise that restricts for example the same and even more cases (that would be safe if not restricted) than ML’s value restriction, but is more regular in the sense that we’ll not be scratching our head about an error message pertaining to the value restriction. In Scala, we’re never allowed to create such a reference. Thus in Scala, we can only express cases where a new instance of a reference is created when it’s type parameters are quantified. Note OCaml relaxes the value restriction in some cases.
Note afaik both Scala and ML don’t enable declaring that a reference is immutable1, although the object they point to can be declared immutable with val. Note there’s no need for the restriction for references that can’t be mutated.
The reason that mutability of the reference type1 is required in order to make the complicated typing cases arise, is because if we instantiate the reference (e.g. in for example the substitutions clause of let) with a non-parametrised object (i.e. not None or Nil4 but instead for example a Option[String] or List[Int]), then the reference won’t have a polymorphic type (pertaining to the object it points to) and thus the re-quantification issue never arises. So the problematic cases are due to instantiation with a polymorphic object then subsequently assigning a newly quantified object (i.e. mutating the reference type) in a re-quantified context followed by dereferencing (reading) from the (object pointed to by) reference in a subsequent re-quantified context. As aforementioned, when the re-quantified type parameters conflict, the typing complication arises and unsafe cases must be prevented/restricted.
Phew! If you understood that without reviewing linked examples, I’m impressed.
1 IMO to instead employ the phrase “mutable references” instead of “mutability of the referenced object” and “mutability of the reference type” would be more potentially confusing, because our intention is to mutate the object’s value (and its type) which is referenced by the pointer— not referring to mutability of the pointer of what the reference points to. Some programming languages don’t even explicitly distinguish when they’re disallowing in the case of primitive types a choice of mutating the reference or the object they point to.
2 Wherein an object may even be a function, in a programming language that allows first-class functions.
3 To prevent a segmentation fault at runtime due to accessing (read or write of) the referenced object with a presumption about its statically (i.e. at compile-time) determined type which is not the type that the object actually has.
4 Which are NONE and [] respectively in ML.

Scala: when to use explicit type annotations

I've been reading a lot of other people's Scala code recently, and one of the things that I have difficultly with (coming from Java) is a lack of explicit type annotations.
It's certainly convenient when writing code to be able to leave out type annotations -- however when reading code I often find that explicit type annotations help me to understand at a glance what code is doing more easily.
The Scala style guide (http://docs.scala-lang.org/style/types.html) doesn't seem to provide any definitive guidance on this, stating:
Use type inference where possible, but put clarity first, and favour explicitness in public APIs.
To my mind, this is a bit contradictory. While it's clearly obvious what type this variable is:
val tokens = new HashMap[String, Int]
It's not so obvious what type this one is:
val tokens = readTokens()
So, if I was putting clarity first I would probably annotate all variables where the type is not already declared on the same line.
Do any Scala practitioners have guidance on this? Am I crazy to be considering adding type annotations to my local variables? I'm particularly interested in hearing from folks who spend a lot of time reading scala code (for example, in code reviews), as well as writing it.
It's not so obvious what type this one is:
val tokens = readTokens()
Good names are important: the name is plural, ergo it returns some collection of some kind. The most general collection types in Scala are Traversable and Iterator, and they mostly share a common interface, so it's not really important which one of the two it is. The name also talks about "reading tokens", ergo it obviously should return Tokens in some fashion. And last but not least, the method call has parentheses, which according to the style guide means it has side-effects, so I wouldn't count on being able to traverse the collection more than once.
Ergo, the return type is something like
Traversable[Token]
or
Iterator[Token]
and which of the two it is doesn't really matter because their client interfaces are mostly identical.
Note also that the latter constraint (only traversing the collection once) isn't even captured in the type, even if you were providing an explicit type, you would still have to look at the name and the style!

Motivation for Scala underscore in terms of formal language theory and good style?

Why is it that many people say that using underscore is good practice in Scala and makes your code more readable? They say the motivation comes from formal language theory. Nevertheless many programmers, particularly from other languages, especially those that have anonymous functions, prefer not to use underscores particularly for placeholders.
So what is the point in the underscore? Why does Scala (and some other functional languages as pointed by om-nom-nom) have the underscore? And what is the formal underpinning, in terms of complexity and language theory, as to why it often good style to use it?
Linguistics
The origin and motivation for most of the underscore uses in Scala is to allow one to construct expressions and declarations without the need to always give every variable (I mean "variable" as in Predicate Calculus, not in programming) of the language a name. We use this all the time in Natural Language, for example I referred to a concept in the previous sentence in this sentence using "this" and I referred to this sentence using "this" without there being any confusion over what I mean. In Natural Language these words are usually called "pronouns", "anaphors", "cataphors", the referents "antecedent" or "postcedent", and the process of understanding/dereferencing them is called "anaphora".
Algorithmic Information Theory
If we had to name every 'thing' in Natural Language before we can refer to it, similarly every type of thing in order to quantify over it, as in Predicate Calculus and in most programming languages, then speaking would become extremely long winded. It is thanks to context that we can infer what is meant by words like "this", "it", "that", etc, we do it easily.
Therefore why restrict this simple, elegant and efficient means to communicate to Natural Language? So it was added to Scala.
If we did attempt to name every single 'thing' or 'type of thing', sentences become so long and complicated that it becomes very difficult to understand due to it's verbosity and the introduction of redundant symbols. The more symbols you add to a sentence the more difficult it becomes to understand, ergo this is why it's good practice, not only in Natural Language, but in Scala too. In fact one could formalize this assertion in terms of Kolmogorov Complexity and prove that a sequence of sentences adopting placeholders have lower complexity than those that unnecessarily name everything (unless the name is exactly the same in every instance, but that usually doesn't make sense). Therefore we can conclusively say contrary to some programmers belief, that the placeholder syntax is simpler and easier to read.
The reason why it has some resistance in it's use, is that if one is already a programmer, one must make an effort to retrain the brain not to name everything, just as (if they can remember) they may have found learning to code in the first place required quite an effort.
Examples
Now let's look at some specific uses more formally:
Placeholder Syntax
Means "it", "them", "that", "their" etc (i.e. pronouns), e.g. 1
lines.map(_.length)
can be read as "map lines to their length", similarly we can read lineOption.map(_.length) as "map the line to it's length". In terms of complexity theory, this is simpler than "for each 'line' in lines, take the length of 'line'" - which would be lines.map(line => line.length).
Can also be read as "the" (definite article) when used with type annotation, e.g.
(_: Int) + 1
"Add 1 to the integer"
Existential Types
Means "of some type" ("some" the pronoun), e.g
foo: Option[_]
means "foo is an Option of some type".
Higher Kinded type parameters
Again, basically means "of some type" ("some" the pronoun), e.g.
class A[K[_],T](a: K[T])
Can be read "class A takes some K of some type ..."
Pattern Match Wildcards
Means "anything" or "whatever" (pronouns), e.g.
case Foo(_) => "hello"
can be read as "for a Foo containing anything, return 'hello'", or "for a Foo containing whatever, return 'hello'"
Import Wildcards
Means "everything" (pronoun), e.g.
import foo._
can be read as "import everything from foo".
Default Values
Now I read this like "a" (indefinite article), e.g.
val wine: RedWine = _
"Give me a red wine", the waiter should give you the house red.
Other uses of underscore
The other uses of underscores are not really related to the point of this Q&A, nevertheless we breifly discuss them
Ignored Values/Params/Extractions
Allow us to ignore things in an explicit 'pattern safe' way. E.g.
val (x, _) = getMyPoint
Says, we are not going to use the second coordinate, so no need to get freaky when you cant find a use in the code.
Import Hidding
Just a way to say "except" (preposition).
Function Application
E.g.
val f: String => Unit = println _
This is an interesting one as it has an exact analogue in linguistics, namely nominalization, "the use of a verb, an adjective, or an adverb as the head of a noun phrase, with or without morphological transformation" - wikipedia. More simply it is the process of turning verbs or adjectives into nouns.
Use in special method names
Purely a syntax thing and doesn't really relate to linguistics.

What do you call the data wrapped inside a monad?

In speech and writing, I keep wanting to refer to the data inside a monad, but I don't know what to call it.
For example, in Scala, the argument to the function passed to flatMap gets bound to…er…that thing inside the monad. In:
List(1, 2, 3).flatMap(x => List(x, x))
x gets bound to that thing I don't have a word for.
Complicating things a bit, the argument passed to the Kleisli arrow doesn't necessarily get bound to all the data inside the monad. With List, Set, Stream, and lots of other monads, flatMap calls the Kleisli arrow many times, binding x to a different piece of the data inside the monad each time. Or maybe not even to "data", so long as the monad laws are followed. Whatever it is, it's wrapped inside the monad, and flatMap passes it to you without the wrapper, perhaps one piece at a time. I just want to know what to call the relevant inside-the-monad stuff that x refers to, at least in part, so I can stop mit all this fumbly language.
Is there a standard or conventional term for this thing/data/value/stuff/whatever-it-is?
If not, how about "the candy"?
Trying to say "x gets bound to" is setting you up for failure. Let me explain, and guide you towards a better way of expressing yourself when talking about these sorts of things.
Suppose we have:
someList.flatMap(x => some_expression)
If we know that someList has type List[Int], then we can safely say that inside of some_expression, x is bound to a value of type Int. Notice the caveat, "inside of some_expression". This is because, given someList = List(1,2,3), x will take on the values of each of them: 1, 2, and 3, in turn.
Consider a more generalized example:
someMonadicValue.flatMap(x => some_expression)
If we know nothing about someMonadicValue, then we don't know much about how some_expression is going to be invoked. It may be run once, or three times (as in the above example), or lazily, or asynchronously, or it may be scheduled for once someMonadicValue is finished (e.g. futures), or it may never be used (e.g. empty list, None). The Monad interface does not include reasoning about when or how someExpression will be used. So all you can say about what x will be is confined to the context of some_expression, whenever and however some_expression happens to be evaluated.
So back to the example.
someMonadicValue.flatMap(x => some_expression)
You are trying to say "x is the ??? of someMonadicValue." And you are looking for the word that accurately replaces ???. Well I'm here to tell you that you're doing it wrong. If you want to speak about x, then either do it
Within the context of some_expression. In this case, use the bolded phrase I gave you above: "inside of some_expression, x is bound to a value of type Foo." Or, alternatively, you can speak about x...
With additional knowledge about which monad you're dealing with.
In case #2, for example, for someList.flatMap(x => some_expression), you could say that "x is each element of someList." For someFuture.flatMap(x => some_expression), you could say that "x is the successful future value of someFuture, if it indeed ever completes and succeeds."
You see, that's the beauty of Monads. That ??? that you are trying to describe, is the thing which the Monad interface abstracts over. Now do you see why it's so difficult to give ??? a name? It's because it has a different name and a different meaning for each particular monad. And that's the point of having the Monad abstraction: to unify these differing concepts under the same computational interface.
Disclaimer: I'm definitely not an expert in functional programming terminology and I expect that the following will not be an answer to your question from your point of view. To me the problem rather is: If choosing a term requires expert knowledge, so does understanding.
Choosing an appropriate term largely depends on:
your desired level of linguistic correctness, and
your audience, and the corresponding connotations of certain terms.
Regarding the linguistic correctness the question is whether you properly want to refer to the values/data that are bound to x, or whether you can live with a certain (incorrect) abstraction. In terms of the audience, I would mainly differentiate between an audience with a solid background in functional programming and an audience coming from other programming paradigms. In case of the former, choosing the term is probably not entirely crucial, since the concept itself is familiar, and many terms would lead to the right association. The discussion in the comments already contains some very good suggestions for this case. However, the discussion also shows that you need a certain background in functional programming to see the rationale behind some terms.
For an audience without a background in functional programming, I would rather sacrifice linguistic correctness in favor of comprehensibility. In such a situation I often just refer to it as "underlying type", just to avoid any confusion that I would probably create by trying to refer to the "thing(s) in the monad" itself. Obviously, it is literally wrong to say "x is bound to the underlying type". However, it is more important to me that my audience understands a concept at all. Since most programmers are familiar with containers and their underlying types, I'm aiming for the (flawed) association "underlying type" => "the thing(s) that are in a container" => "the thing(s) inside a monad", which often seems to work.
TL;DR: There always is a trade-off between correctness and accessibility. And when it comes to functional programming, it is sometimes helpful to shift the bias towards the latter.
flatMap does not call the Kleisli arrow many times. And "that thing" is not "inside" the monad.
flatMap lifts a Kleisli arrow to the monad. You could see this as the construction of an arrow M[A] => M[B] between types (A, B) lifted to the monad (M[A], M[B]), given a Kleisli arrow A => M[B].
So x in x => f(x) is the value being lifted.
What data?
data Monady a = Monady
Values of monads are values of monads, their wrapped type may be entirely a fiction. Which is to say that talking about it as if it exists may cause you pain.
What you want to talk about are the continuations like Monad m => a -> m b as they are guaranteed to exist. The funny stuff occurs in how (>>=) uses those continuations.
'Item' seems good? 'Item parameter' if you need to be more specific.
Either the source data can contain multiple items, or the operation can call it multiple times. Element would tend to be more specific for the first case, Value is singular as to the source & not sensible for list usages, but Item covers all cases correctly.
Disclaimer: I know more about understandable English than about FP.
Take a step back here.
A monad isn't just the functor m that wraps a value a. A monad is the stack of endofunctors (i.e., the compositions of m's), together with the join operator. That's where the famous quip -- that a monad is a monoid in the category of endofunctors, what's the problem? -- comes from.
(The whole story is that the quip means the composition of m's is another m, as witnessed by join)
A thing with the type (m a) is typically called a monad action. You can call a the result of the action.

Why doesn't scala's parallel sequences have a contains method?

Why does
List.range(0,100).contains(2)
Work, while
List.range(0,100).par.contains(2)
Does not?
This is planned for the future?
The non-teleological answer is that it's because contains is defined in SeqLike but not in ParSeqLike.
If that doesn't satisfy your curiosity, you can find that SeqLike's contains is defined thus:
def contains(elem: Any): Boolean = exists (_ == elem)
So for your example you can write
List.range(0,100).par.exists(_ == 2)
ParSeqLike is missing a few other methods as well, some of which would be hard to implement efficiently (e.g. indexOfSlice) and some for less obvious reasons (e.g. combinations - maybe because that's only useful on small datasets). But if you have a parallel collection you can also use .seq to get back to the linear version and get your methods back:
List.range(0,100).par.seq.contains(2)
As for why the library designers left it out... I'm totally guessing, but maybe they wanted to reduce the number of methods for simplicity's sake, and it's nearly as easy to use exists.
This also raises the question, why is contains defined on SeqLike rather than on the granddaddy of all collections, GenTraversableOnce, where you find exists? A possible reason is that contains for Map is semantically a different method to that on Set and Seq. A Map[A,B] is a Traversable[(A,B)], so if contains were defined for Traversable, contains would need to take a tuple (A,B) argument; however Map's contains takes just an A argument. Given this, I think contains should be defined in GenSeqLike - maybe this is an oversight that will be corrected.
(I thought at first maybe parallel sequences don't have contains because searching where you intend to stop after finding your target on parallel collections is a lot less efficient than the linear version (the various threads do a lot of unnecessary work after the value is found: see this question), but that can't be right because exists is there.)