Is everything a function or expression or object in scala? - scala

I am confused.
I thought everything is expression because the statement returns a value. But I also heard that everything is an object in scala.
What is it in reality? Why did scala choose to do it one way or the other? What does that mean to a scala developer?

I thought everything is expression because the statement returns a value.
There are things that don't have values, but for the most part, this is correct. That means that we can basically drop the distinction between "statement" and "expression" in Scala.
The term "returns a value" is not quite fitting, however. We say everything "evaluates" to a value.
But I also heard that everything is an object in scala.
That doesn't contradict the previous statement at all :) It just means that every possible value is an object (so every expression evaluates to an object). By the way, functions, as first-class citizens in Scala, are objects, too.
Why did scala choose to do it one way or the other?
It has to be noted that this is in fact a generalization of the Java way, where statements and expressions are distinct things and not everything is an object. You can translate every piece of Java code to Scala without a lot of adaptions, but not the other way round. So this design decision makes Scala is in fact more powerful in terms of conciseness and expressiveness.
What does that mean to a scala developer?
It means, for example, that:
You often don't need return, because you can just put the return value as the last expression in a method
You can exploit the fact that if and case are expressions to make your code shorter
An example would be:
def mymethod(x: Int) = if (x > 2) "yay!" else "too low!"
// ...
println(mymethod(10)) // => prints "yay!"
println(mymethod(0)) // => prints "too low!"
We can also assign the value of such a compound expression to a variable:
val str = value match {
case Some(x) => "Result: " + x
case None => "Error!"
}

The distinction here is that the assertion "everything is a expression" is being made about blocks of code, whereas "everything is an object" is being made about the values in your program.
Blocks of Code
In the Java language, there are both expressions and statements. That is, any "block" of code is either an expression or a statement;
//the if-else-block is a statement whilst (x == 4) is an expression
if (x == 4) {
foo();
}
else {
bar();
}
Expressions have a type; statements do not; statements are invoked purely for their side-effects.
In scala, every block of code is an expression; that is, it has a type:
if (x == 4) foo else bar //has the type lub of foo and bar
This is extremely important! I cannot stress this enough; it's one of the things which makes scala a pleasure to work with because I can assign a value to an expression.
val x = if (x == 4) foo else bar
Values
By value, I mean something that we might reference in the program:
int i = foo(); //i is a reference to a value
java.util.TimeUnit.SECONDS;
In Java, the i above is not an object - it is a primitive. Furthermore I can access the field SECONDS of TimeUnit, but TimeUnit is not an object either; it is a static (for want of a better phrase). In scala:
val j = 4
Map.empty
As far as the language is concerned, j is an object (and I may dispatch methods to it), as is Map - the module (or companion object) for the type scala.collection.immutable.Map.

Is everything a function or expression or object in scala?
None of it.
There are things that are not objects. e.g. classes, methods.
There are things that are not expressions. e.g. class Foo { } is a statement, and does not evaluate to any value. (This is basically the same point as above.)
There are things that are not functions. I don't need to mention any examples for this one as there would be plenty in sight in any Scala code.
In other words, "Everything is a X" is nothing more than a sales pitch (in case of Scala).

Related

Spark Scala Why is it that I can make certain function calls with parenthesis and certain function calls without parenthesis [duplicate]

What are the precise rules for when you can omit (omit) parentheses, dots, braces, = (functions), etc.?
For example,
(service.findAllPresentations.get.first.votes.size) must be equalTo(2).
service is my object
def findAllPresentations: Option[List[Presentation]]
votes returns List[Vote]
must and be are both functions of specs
Why can't I go:
(service findAllPresentations get first votes size) must be equalTo(2)
?
The compiler error is:
"RestServicesSpecTest.this.service.findAllPresentations
of type
Option[List[com.sharca.Presentation]]
does not take parameters"
Why does it think I'm trying to pass in a parameter? Why must I use dots for every method call?
Why must (service.findAllPresentations get first votes size) be equalTo(2) result in:
"not found: value first"
Yet, the "must be equalTo 2" of
(service.findAllPresentations.get.first.votes.size) must be equalTo 2, that is, method chaining works fine? - object chain chain chain param.
I've looked through the Scala book and website and can't really find a comprehensive explanation.
Is it in fact, as Rob H explains in Stack Overflow question Which characters can I omit in Scala?, that the only valid use-case for omitting the '.' is for "operand operator operand" style operations, and not for method chaining?
You seem to have stumbled upon the answer. Anyway, I'll try to make it clear.
You can omit dot when using the prefix, infix and postfix notations -- the so called operator notation. While using the operator notation, and only then, you can omit the parenthesis if there is less than two parameters passed to the method.
Now, the operator notation is a notation for method-call, which means it can't be used in the absence of the object which is being called.
I'll briefly detail the notations.
Prefix:
Only ~, !, + and - can be used in prefix notation. This is the notation you are using when you write !flag or val liability = -debt.
Infix:
That's the notation where the method appears between an object and it's parameters. The arithmetic operators all fit here.
Postfix (also suffix):
That notation is used when the method follows an object and receives no parameters. For example, you can write list tail, and that's postfix notation.
You can chain infix notation calls without problem, as long as no method is curried. For example, I like to use the following style:
(list
filter (...)
map (...)
mkString ", "
)
That's the same thing as:
list filter (...) map (...) mkString ", "
Now, why am I using parenthesis here, if filter and map take a single parameter? It's because I'm passing anonymous functions to them. I can't mix anonymous functions definitions with infix style because I need a boundary for the end of my anonymous function. Also, the parameter definition of the anonymous function might be interpreted as the last parameter to the infix method.
You can use infix with multiple parameters:
string substring (start, end) map (_ toInt) mkString ("<", ", ", ">")
Curried functions are hard to use with infix notation. The folding functions are a clear example of that:
(0 /: list) ((cnt, string) => cnt + string.size)
(list foldLeft 0) ((cnt, string) => cnt + string.size)
You need to use parenthesis outside the infix call. I'm not sure the exact rules at play here.
Now, let's talk about postfix. Postfix can be hard to use, because it can never be used anywhere except the end of an expression. For example, you can't do the following:
list tail map (...)
Because tail does not appear at the end of the expression. You can't do this either:
list tail length
You could use infix notation by using parenthesis to mark end of expressions:
(list tail) map (...)
(list tail) length
Note that postfix notation is discouraged because it may be unsafe.
I hope this has cleared all the doubts. If not, just drop a comment and I'll see what I can do to improve it.
Class definitions:
val or var can be omitted from class parameters which will make the parameter private.
Adding var or val will cause it to be public (that is, method accessors and mutators are generated).
{} can be omitted if the class has no body, that is,
class EmptyClass
Class instantiation:
Generic parameters can be omitted if they can be inferred by the compiler. However note, if your types don't match, then the type parameter is always infered so that it matches. So without specifying the type, you may not get what you expect - that is, given
class D[T](val x:T, val y:T);
This will give you a type error (Int found, expected String)
var zz = new D[String]("Hi1", 1) // type error
Whereas this works fine:
var z = new D("Hi1", 1)
== D{def x: Any; def y: Any}
Because the type parameter, T, is inferred as the least common supertype of the two - Any.
Function definitions:
= can be dropped if the function returns Unit (nothing).
{} for the function body can be dropped if the function is a single statement, but only if the statement returns a value (you need the = sign), that is,
def returnAString = "Hi!"
but this doesn't work:
def returnAString "Hi!" // Compile error - '=' expected but string literal found."
The return type of the function can be omitted if it can be inferred (a recursive method must have its return type specified).
() can be dropped if the function doesn't take any arguments, that is,
def endOfString {
return "myDog".substring(2,1)
}
which by convention is reserved for methods which have no side effects - more on that later.
() isn't actually dropped per se when defining a pass by name paramenter, but it is actually a quite semantically different notation, that is,
def myOp(passByNameString: => String)
Says myOp takes a pass-by-name parameter, which results in a String (that is, it can be a code block which returns a string) as opposed to function parameters,
def myOp(functionParam: () => String)
which says myOp takes a function which has zero parameters and returns a String.
(Mind you, pass-by-name parameters get compiled into functions; it just makes the syntax nicer.)
() can be dropped in the function parameter definition if the function only takes one argument, for example:
def myOp2(passByNameString:(Int) => String) { .. } // - You can drop the ()
def myOp2(passByNameString:Int => String) { .. }
But if it takes more than one argument, you must include the ():
def myOp2(passByNameString:(Int, String) => String) { .. }
Statements:
. can be dropped to use operator notation, which can only be used for infix operators (operators of methods that take arguments). See Daniel's answer for more information.
. can also be dropped for postfix functions
list tail
() can be dropped for postfix operators
list.tail
() cannot be used with methods defined as:
def aMethod = "hi!" // Missing () on method definition
aMethod // Works
aMethod() // Compile error when calling method
Because this notation is reserved by convention for methods that have no side effects, like List#tail (that is, the invocation of a function with no side effects means that the function has no observable effect, except for its return value).
() can be dropped for operator notation when passing in a single argument
() may be required to use postfix operators which aren't at the end of a statement
() may be required to designate nested statements, ends of anonymous functions or for operators which take more than one parameter
When calling a function which takes a function, you cannot omit the () from the inner function definition, for example:
def myOp3(paramFunc0:() => String) {
println(paramFunc0)
}
myOp3(() => "myop3") // Works
myOp3(=> "myop3") // Doesn't work
When calling a function that takes a by-name parameter, you cannot specify the argument as a parameter-less anonymous function. For example, given:
def myOp2(passByNameString:Int => String) {
println(passByNameString)
}
You must call it as:
myOp("myop3")
or
myOp({
val source = sourceProvider.source
val p = myObject.findNameFromSource(source)
p
})
but not:
myOp(() => "myop3") // Doesn't work
IMO, overuse of dropping return types can be harmful for code to be re-used. Just look at specification for a good example of reduced readability due to lack of explicit information in the code. The number of levels of indirection to actually figure out what the type of a variable is can be nuts. Hopefully better tools can avert this problem and keep our code concise.
(OK, in the quest to compile a more complete, concise answer (if I've missed anything, or gotten something wrong/inaccurate please comment), I have added to the beginning of the answer. Please note this isn't a language specification, so I'm not trying to make it exactly academically correct - just more like a reference card.)
A collection of quotes giving insight into the various conditions...
Personally, I thought there'd be more in the specification. I'm sure there must be, I'm just not searching for the right words...
There are a couple of sources however, and I've collected them together, but nothing really complete / comprehensive / understandable / that explains the above problems to me...:
"If a method body has more than one
expression, you must surround it with
curly braces {…}. You can omit the
braces if the method body has just one
expression."
From chapter 2, "Type Less, Do More", of Programming Scala:
"The body of the upper method comes
after the equals sign ‘=’. Why an
equals sign? Why not just curly braces
{…}, like in Java? Because semicolons,
function return types, method
arguments lists, and even the curly
braces are sometimes omitted, using an
equals sign prevents several possible
parsing ambiguities. Using an equals
sign also reminds us that even
functions are values in Scala, which
is consistent with Scala’s support of
functional programming, described in
more detail in Chapter 8, Functional
Programming in Scala."
From chapter 1, "Zero to Sixty: Introducing Scala", of Programming Scala:
"A function with no parameters can be
declared without parentheses, in which
case it must be called with no
parentheses. This provides support for
the Uniform Access Principle, such
that the caller does not know if the
symbol is a variable or a function
with no parameters.
The function body is preceded by "="
if it returns a value (i.e. the return
type is something other than Unit),
but the return type and the "=" can be
omitted when the type is Unit (i.e. it
looks like a procedure as opposed to a
function).
Braces around the body are not
required (if the body is a single
expression); more precisely, the body
of a function is just an expression,
and any expression with multiple parts
must be enclosed in braces (an
expression with one part may
optionally be enclosed in braces)."
"Functions with zero or one argument
can be called without the dot and
parentheses. But any expression can
have parentheses around it, so you can
omit the dot and still use
parentheses.
And since you can use braces anywhere
you can use parentheses, you can omit
the dot and put in braces, which can
contain multiple statements.
Functions with no arguments can be
called without the parentheses. For
example, the length() function on
String can be invoked as "abc".length
rather than "abc".length(). If the
function is a Scala function defined
without parentheses, then the function
must be called without parentheses.
By convention, functions with no
arguments that have side effects, such
as println, are called with
parentheses; those without side
effects are called without
parentheses."
From blog post Scala Syntax Primer:
"A procedure definition is a function
definition where the result type and
the equals sign are omitted; its
defining expression must be a block.
E.g., def f (ps) {stats} is
equivalent to def f (ps): Unit =
{stats}.
Example 4.6.3 Here is a declaration
and a de?nition of a procedure named
write:
trait Writer {
def write(str: String)
}
object Terminal extends Writer {
def write(str: String) { System.out.println(str) }
}
The code above is implicitly completed
to the following code:
trait Writer {
def write(str: String): Unit
}
object Terminal extends Writer {
def write(str: String): Unit = { System.out.println(str) }
}"
From the language specification:
"With methods which only take a single
parameter, Scala allows the developer
to replace the . with a space and omit
the parentheses, enabling the operator
syntax shown in our insertion operator
example. This syntax is used in other
places in the Scala API, such as
constructing Range instances:
val firstTen:Range = 0 to 9
Here again, to(Int) is a vanilla
method declared inside a class
(there’s actually some more implicit
type conversions here, but you get the
drift)."
From Scala for Java Refugees Part 6: Getting Over Java:
"Now, when you try "m 0", Scala
discards it being a unary operator, on
the grounds of not being a valid one
(~, !, - and +). It finds that "m" is
a valid object -- it is a function,
not a method, and all functions are
objects.
As "0" is not a valid Scala
identifier, it cannot be neither an
infix nor a postfix operator.
Therefore, Scala complains that it
expected ";" -- which would separate
two (almost) valid expressions: "m"
and "0". If you inserted it, then it
would complain that m requires either
an argument, or, failing that, a "_"
to turn it into a partially applied
function."
"I believe the operator syntax style
works only when you've got an explicit
object on the left-hand side. The
syntax is intended to let you express
"operand operator operand" style
operations in a natural way."
Which characters can I omit in Scala?
But what also confuses me is this quote:
"There needs to be an object to
receive a method call. For instance,
you cannot do “println “Hello World!”"
as the println needs an object
recipient. You can do “Console
println “Hello World!”" which
satisfies the need."
Because as far as I can see, there is an object to receive the call...
I find it easier to follow this rule of thumb: in expressions spaces alternate between methods and parameters. In your example, (service.findAllPresentations.get.first.votes.size) must be equalTo(2) parses as (service.findAllPresentations.get.first.votes.size).must(be)(equalTo(2)). Note that the parentheses around the 2 have a higher associativity than the spaces. Dots also have higher associativity, so (service.findAllPresentations.get.first.votes.size) must be.equalTo(2)would parse as (service.findAllPresentations.get.first.votes.size).must(be.equalTo(2)).
service findAllPresentations get first votes size must be equalTo 2 parses as service.findAllPresentations(get).first(votes).size(must).be(equalTo).2.
Actually, on second reading, maybe this is the key:
With methods which only take a single
parameter, Scala allows the developer
to replace the . with a space and omit
the parentheses
As mentioned on the blog post: http://www.codecommit.com/blog/scala/scala-for-java-refugees-part-6 .
So perhaps this is actually a very strict "syntax sugar" which only works where you are effectively calling a method, on an object, which takes one parameter. e.g.
1 + 2
1.+(2)
And nothing else.
This would explain my examples in the question.
But as I said, if someone could point out to be exactly where in the language spec this is specified, would be great appreciated.
Ok, some nice fellow (paulp_ from #scala) has pointed out where in the language spec this information is:
6.12.3:
Precedence and associativity of
operators determine the grouping of
parts of an expression as follows.
If there are several infix operations in an expression, then
operators with higher precedence bind
more closely than operators with lower
precedence.
If there are consecutive infix operations e0 op1 e1 op2 . . .opn en
with operators op1, . . . , opn of the
same precedence, then all these
operators must have the same
associativity. If all operators are
left-associative, the sequence is
interpreted as (. . . (e0 op1 e1) op2
. . .) opn en. Otherwise, if all
operators are rightassociative, the
sequence is interpreted as e0 op1 (e1
op2 (. . .opn en) . . .).
Postfix operators always have lower precedence than infix operators. E.g.
e1 op1 e2 op2 is always equivalent to
(e1 op1 e2) op2.
The right-hand operand of a
left-associative operator may consist
of several arguments enclosed in
parentheses, e.g. e op (e1, . . .
,en). This expression is then
interpreted as e.op(e1, . . . ,en).
A left-associative binary operation e1
op e2 is interpreted as e1.op(e2). If
op is rightassociative, the same
operation is interpreted as { val
x=e1; e2.op(x ) }, where x is a fresh
name.
Hmm - to me it doesn't mesh with what I'm seeing or I just don't understand it ;)
There aren't any. You will likely receive advice around whether or not the function has side-effects. This is bogus. The correction is to not use side-effects to the reasonable extent permitted by Scala. To the extent that it cannot, then all bets are off. All bets. Using parentheses is an element of the set "all" and is superfluous. It does not provide any value once all bets are off.
This advice is essentially an attempt at an effect system that fails (not to be confused with: is less useful than other effect systems).
Try not to side-effect. After that, accept that all bets are off. Hiding behind a de facto syntactic notation for an effect system can and does, only cause harm.

What is "value" in pure functional programming?

What constitutes a value in pure functional programming?
I am asking myself these questions after seeing a sentence:
Task(or IO) has a constructor that captures side-effects as values.
Is a function a value?
If so, what does it mean when equating two functions: assert(f == g). For two functions that are equivalent but defined separately => f != g, why don't they work as 1 == 1?
Is an object with methods a value? (for example IO { println("") })
Is an object with setter methods and mutable state a value?
Is an object with mutable state which works as a state machine a value?
How do we test whether something is a value? Is immutability a sufficient condition?
UPDATE:
I'm using Scala.
I'll try to explain what a value is by contrasting it with things that are not values.
Roughly speaking, values are structures produced by the process of evaluation which correspond to terms that cannot be simplified any further.
Terms
First, what are terms? Terms are syntactic structures that can be evaluated. Admittedly, this is a bit circular, so let's look at a few examples:
Constant literals are terms:
42
Functions applied to other terms are terms:
atan2(123, 456 + 789)
Function literals are terms
(x: Int) => x * x
Constructor invocations are terms:
Option(42)
Contrast this to:
Class declarations / definitions are not terms:
case class Foo(bar: Int)
that is, you cannot write
val x = (case class Foo(bar: Int))
this would be illegal.
Likewise, trait and type definitions are not terms:
type Bar = Int
sealed trait Baz
Unlike function literals, method definitions are not terms:
def foo(x: Int) = x * x
for example:
val x = (a: Int) => a * 2 // function literal, ok
val y = (def foo(a: Int): Int = a * 2) // no, not a term
Package declarations and import statements are not terms:
import foo.bar.baz._ // ok
List(package foo, import bar) // no
Normal forms, values
Now, when it is hopefully somewhat clearer what a term is, what was meant by "cannot be simplified any further*? In idealized functional programming languages, you can define what a normal form, or rather weak head normal form is. Essentially, a term is in a (wh-) normal form if no reduction rules can be applied to the term to make it any simpler. Again, a few examples:
This is a term, but it's not in normal form, because it can be reduced to 42:
40 + 2
This is not in weak head normal form:
((x: Int) => x * 2)(3)
because we can further evaluate it to 6.
This lambda is in weak head normal form (it's stuck, because the computation cannot proceed until an x is supplied):
(x: Int) => x * 42
This is not in normal form, because it can be simplified further:
42 :: List(10 + 20, 20 + 30)
This is in normal form, no further simplifications possible:
List(42, 30, 50)
Thus,
42,
(x: Int) => x * 42,
List(42, 30, 50)
are values, whereas
40 + 2,
((x: Int) => x * 2)(3),
42 :: List(10 + 20, 20 + 30)
are not values, but merely non-normalized terms that can be further simplified.
Examples and non-examples
I'll just go through your list of sub-questions one-by-one:
Is a function a value
Yes, things like (x: T1, ..., xn: Tn) => body are considered to be stuck terms in WHNF, in functional languages they can actually be represented, so they are values.
If so, what does it mean when equating two functions: assert(f == g) for two functions that are equivalent but defined separately => f != g, why don't they work as 1 == 1?
Function extensionality is somewhat unrelated to the question whether something is a value or not. In the above "definition by example", I talked only about the shape of the terms, not about the existence / non-existence of some computable relations defined on those terms. The sad fact is that you can't even really determine whether a lambda-expression actually represents a function (i.e. whether it terminates for all inputs), and it is also known that there cannot be an algorithm that could determine whether two functions produce the same output for all inputs (i.e. are extensionally equal).
Is an object with methods a value? (for example IO { println("") })
Not quite clear what you're asking here. Objects don't have methods. Classes have methods. If you mean method invocations, then, no, they are terms that can be further simplified (by actually running the method), so they are not values.
Is an object with setter methods and mutable state a value?
Is an object with mutable state which works as a state machine a value?
There is no such thing in pure functional programming.
What constitutes a value in pure functional programming?
Background
In pure functional programming there is no mutation. Hence, code such as
case class C(x: Int)
val a = C(42)
val b = C(42)
would become equivalent to
case class C(x: Int)
val a = C(42)
val b = a
since, in pure functional programming, if a.x == b.x, then we would have a == b. That is, a == b would be implemented comparing the values inside.
However, Scala is not pure, since it allows mutation, like Java. In such case, we do NOT have the equivalence between the two snippets above, when we declare case class C(var x: Int). Indeed, performing a.x += 1 afterwords does not affect b.x in the first snippet, but does in the second one, where a and b point to the same object. In such case, it is useful to have a comparison a == b which compares the object references, rather than its inner integer value.
When using case class C(x: Int), Scala comparisons a == b behave closer to pure functional programming, comparing the integers values. With regular (non case) classes, Scala instead compares object references breaking the equivalence between the two snippets. But, again, Scala is not pure. By comparison, in Haskell
data C = C Int deriving (Eq)
a = C 42
b = C 42
is indeed equivalent to
data C = C Int deriving (Eq)
a = C 42
b = a
since there are no "references" or "object identities" in Haskell. Note that the Haskell implementation likely will allocate two "objects" in the first snippet, and only one object in the second one, but since there is no way to tell them apart inside Haskell, the program output will be the same.
Answer
Is a function a value ? (then what it means when equating two function: assert(f==g). For two function that is equivalent but defined separately => f!=g, why not they work like 1==1)
Yes, functions are values in pure functional programming.
Above, when you mention "function that is equivalent but defined separately", you are assuming that we can compare the "references" or "object identities" for these two functions. In pure functional programming we can not.
Pure functional programming should compare functions making f == g equivalent to f x == g x for all possible arguments x. This is feasible when there is only a few values for x, e.g. if f,g :: Bool -> Int we only need to check x=True, x=False. For functions having infinite domains, this is much harder. For instance, if f,g :: String -> Int we can not check infinitely many strings.
Theoretical computer science (computability theory) also proved that there is no algorithm to compare two functions String -> Int, not even an inefficient algorithm, not even if we have access to the source code of the two functions. For this mathematical reason, we must accept that functions are values that can not be compared. In Haskell, we express this through the Eq typeclass, stating that almost all the standard types are comparable, functions being the exception.
Is an object with methods a value ? (for example, IO{println("")})
Yes. Roughly speaking, "everything is a value", including IO actions.
Is an object with setter methods and mutable states a value ?
Is an object with mutable states and works as a state machine a value ?
There is no mutable state in pure functional programming.
At best, the setters can produce a "new" object with the modified fields.
And yes, the object would be a value.
How do we test if it is a value, is that immutable can be a sufficient condition to be a value ?
In pure functional programming, we can only have immutable data.
In impure functional programming, I think we can call most immutable objects "values", when we do not compare object references. If the "immutable" object contains a reference to a mutable object, e.g.
case class D(var x: Int)
case class C(c: C)
val a = C(D(42))
then things are more tricky. I guess we could still call a "immutable", since we can not alter a.c, but we should be careful since a.c.x can be mutated.
Depending on the intent, I think that some would not call a immutable. I would not consider a to be a value.
To make things more muddy, in impure programming, there are objects which use mutation to present a "pure" interface in an efficient way. For instance one can write a pure function that, before returning, stores its result in a cache. When called again on the same argument, it will return the previously computed result
(this is usually called memoization). Here, mutation happens, but it is not observable from outside, where at most we can observe a faster implementation. In this case, we can simply pretend the that function is pure (even if it performs mutation) and consider it a "value".
The contrast with imperative languages is stark. In inperitive languages, like Python, the output of a function is directed. It can be assigned to a variable, explicitly returned, printed or written to a file.
When I compose a function in Haskell, I never consider output. I never use "return" Everything has "a" value. This is called "symbolic" programming. By "everything", is meant "symbols". Like human language, the nouns and verbs represent something. That something is their value. The "value" of "Pete" is Pete. The name "Pete" is not Pete but is a representation of Pete, the person. The same is true of functional programming. The best analogy is math or logic When you do pages of calculations, do you direct the output of each function? You even "assign" variables to be replaced by their "value" in functions or expressions.
Values are
Immutable/Timeless
Anonymous
Semantically Transparent
What is the value of 42? 42. What is the "value" of new Date()? Date object at 0x3fa89c3. What is the identity of 42? 42. What is the identity of new Date()? As we saw in the previous example, it's the thing that lives at the place. It may have many different "values" in different contexts but it has only one identity. OTOH, 42 is sufficient unto itself. It's semantically meaningless to ask where 42 lives in the system. What is the semantic meaning of 42? Magnitude of 42. What is the semantic meaning of new Foo()? Who knows.
I would add a fourth criterion (see this in some contexts in the wild but not others) which is: values are language agnostic (I'm not certain the first 3 are sufficient to guarantee this nor that such a rule is entirely consistent with most people's intuition of what value means).
Values are things that
functions can take as inputs and return as outputs, that is, can be computed, and
are members of a type, that is, elements of some set, and
can be bound to a variable, that is, can be named.
First point is really the crucial test whether something is a value. Perhaps the word value, due to conditioning, might immediately make us think of just numbers, but the concept is very general. Essentially anything we can give to and get out of a function can be considered a value. Numbers, strings, booleans, instances of classes, functions themselves, predicates, and even types themselves, can be inputs and outputs of functions, and thus are values.
IO monad is a great example of how general this concept is. When we say IO monad models side-effects as values, we mean a function can take a side-effect (say println) as input and return as output. IO(println(...)) separates the idea of the effect of an action of println from the actual execution of an action, and allows these effects to be considered as first class values that can be computed with using the same language facilities as for any other values such as numbers.

Scala type hierarchy

I looked at Scala Type Hierarchy
It's pretty clear that Unit is a subtype of AnyVal. So I tried this:
object Main extends App {
val u : Unit = Seq[String]()
}
and it compiled fine. But I expected some error. DEMO.
Why? Unit is not a supertype of a Seq
This happens because Unit can also be inferred from statements (in contrast with expressions). This means that if you use the Unit type annotation, the compiler will ignore the last value in your expression and return Unit after your actual code.
You can simply testing this, by casting your u back to its original type. You will get a cast error, because u isn't actually bound to your Seq.
Here's what happens if you run this in the REPL:
scala> u.asInstanceOf[Seq[String]]
java.lang.ClassCastException: scala.runtime.BoxedUnit cannot be cast to scala.collection.Seq
As Jasper-M correctly pointed out, the compiler will rewrite this to
val u: Unit = { Seq[String](); () }
Why? Unit is not a supertype of a Seq
You are assuming that the Seq is assigned to u. But it isn't:
println(u)
# ()
As you can see, the value of u is () (i.e. the unique singleton instance of Unit), not an empty Seq.
This is due to Value Discarding, which is described in clause 5 of section 6.26.1 Value Conversions of the Scala Language Specification:
The following seven implicit conversions can be applied to an expression e which has some value type T and which is type-checked with some expected type pt.
[…]
If e has some value type and the expected type is Unit, e is converted to the expected type by embedding it in the term { e; () }.
In plain English as opposed to "Speclish", what this means is: if you say "this is of type Unit", but it actually isn't, the compiler will return () for you.
The Unit type is sort-of the equivalent to a statement: statements have no value. Well, in Scala, everything has a value, but we have () which is a useless value, so what would in other languages be a statement which has no value, is in Scala an expression which has a useless value. And just like other languages which distinguish between expressions and statements have such things like statement expressions and expression statements (e.g. ECMAScript, Java, C, C++, C♯), Scala has Value Discarding, which is essentially its analog for an Expression Statement (i.e. a statement which contains an expression whose value is discarded).
For example, in C you are allowed to write
printf("Hello, World");
even though printf is a function which returns a size_t. But, C will happily allow you to use printf as a statement and simply discard the return value. Scala does the same.
Whether or not that's a good thing is a different question. Would it be better if C forced you to write
size_t ignoreme = printf("Hello, World");
or if Scala forced you to write
val _ = processKeyBinding​(…)
// or
{ processKeyBinding​(…); () }
instead of just
processKeyBinding​(…)
I don't know. However, one of Scala's design goals is to integrate well with the host platform and that does not just mean high-performance language interoperability with other languages, it also means easy onboarding of existing developers, so there are some features in Scala that are not actually needed but exist for familiarity with developers from the host platform (e.g. Java, C♯, ECMAScript) – Value Discarding is one of them, while loops are another.

Is there a reason why assignments in scala evaluate to Unit? [duplicate]

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?
A common pattern in I/O programming is to do things like this:
while ((bytesRead = in.read(buffer)) != -1) { ...
But this is not possible in Scala because...
bytesRead = in.read(buffer)
.. returns Unit, not the new value of bytesRead.
Seems like an interesting thing to leave out of a functional language.
I am wondering why it was done so?
I advocated for having assignments return the value assigned rather than unit. Martin and I went back and forth on it, but his argument was that putting a value on the stack just to pop it off 95% of the time was a waste of byte-codes and have a negative impact on performance.
I'm not privy to inside information on the actual reasons, but my suspicion is very simple. Scala makes side-effectful loops awkward to use so that programmers will naturally prefer for-comprehensions.
It does this in many ways. For instance, you don't have a for loop where you declare and mutate a variable. You can't (easily) mutate state on a while loop at the same time you test the condition, which means you often have to repeat the mutation just before it, and at the end of it. Variables declared inside a while block are not visible from the while test condition, which makes do { ... } while (...) much less useful. And so on.
Workaround:
while ({bytesRead = in.read(buffer); bytesRead != -1}) { ...
For whatever it is worth.
As an alternate explanation, perhaps Martin Odersky had to face a few very ugly bugs deriving from such usage, and decided to outlaw it from his language.
EDIT
David Pollack has answered with some actual facts, which are clearly endorsed by the fact that Martin Odersky himself commented his answer, giving credence to the performance-related issues argument put forth by Pollack.
This happened as part of Scala having a more "formally correct" type system. Formally-speaking, assignment is a purely side-effecting statement and therefore should return Unit. This does have some nice consequences; for example:
class MyBean {
private var internalState: String = _
def state = internalState
def state_=(state: String) = internalState = state
}
The state_= method returns Unit (as would be expected for a setter) precisely because assignment returns Unit.
I agree that for C-style patterns like copying a stream or similar, this particular design decision can be a bit troublesome. However, it's actually relatively unproblematic in general and really contributes to the overall consistency of the type system.
Perhaps this is due to the command-query separation principle?
CQS tends to be popular at the intersection of OO and functional programming styles, as it creates an obvious distinction between object methods that do or do not have side-effects (i.e., that alter the object). Applying CQS to variable assignments is taking it further than usual, but the same idea applies.
A short illustration of why CQS is useful: Consider a hypothetical hybrid F/OO language with a List class that has methods Sort, Append, First, and Length. In imperative OO style, one might want to write a function like this:
func foo(x):
var list = new List(4, -2, 3, 1)
list.Append(x)
list.Sort()
# list now holds a sorted, five-element list
var smallest = list.First()
return smallest + list.Length()
Whereas in more functional style, one would more likely write something like this:
func bar(x):
var list = new List(4, -2, 3, 1)
var smallest = list.Append(x).Sort().First()
# list still holds an unsorted, four-element list
return smallest + list.Length()
These seem to be trying to do the same thing, but obviously one of the two is incorrect, and without knowing more about the behavior of the methods, we can't tell which one.
Using CQS, however, we would insist that if Append and Sort alter the list, they must return the unit type, thus preventing us from creating bugs by using the second form when we shouldn't. The presence of side effects therefore also becomes implicit in the method signature.
I'd guess this is in order to keep the program / the language free of side effects.
What you describe is the intentional use of a side effect which in the general case is considered a bad thing.
It is not the best style to use an assignment as a boolean expression. You perform two things at the same time which leads often to errors. And the accidential use of "=" instead of "==" is avoided with Scalas restriction.
By the way: I find the initial while-trick stupid, even in Java. Why not somethign like this?
for(int bytesRead = in.read(buffer); bytesRead != -1; bytesRead = in.read(buffer)) {
//do something
}
Granted, the assignment appears twice, but at least bytesRead is in the scope it belongs to, and I'm not playing with funny assignment tricks...
You can have a workaround for this as long as you have a reference type for indirection. In a naïve implementation, you can use the following for arbitrary types.
case class Ref[T](var value: T) {
def := (newval: => T)(pred: T => Boolean): Boolean = {
this.value = newval
pred(this.value)
}
}
Then, under the constraint that you’ll have to use ref.value to access the reference afterwards, you can write your while predicate as
val bytesRead = Ref(0) // maybe there is a way to get rid of this line
while ((bytesRead := in.read(buffer)) (_ != -1)) { // ...
println(bytesRead.value)
}
and you can do the checking against bytesRead in a more implicit manner without having to type it.

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?
A common pattern in I/O programming is to do things like this:
while ((bytesRead = in.read(buffer)) != -1) { ...
But this is not possible in Scala because...
bytesRead = in.read(buffer)
.. returns Unit, not the new value of bytesRead.
Seems like an interesting thing to leave out of a functional language.
I am wondering why it was done so?
I advocated for having assignments return the value assigned rather than unit. Martin and I went back and forth on it, but his argument was that putting a value on the stack just to pop it off 95% of the time was a waste of byte-codes and have a negative impact on performance.
I'm not privy to inside information on the actual reasons, but my suspicion is very simple. Scala makes side-effectful loops awkward to use so that programmers will naturally prefer for-comprehensions.
It does this in many ways. For instance, you don't have a for loop where you declare and mutate a variable. You can't (easily) mutate state on a while loop at the same time you test the condition, which means you often have to repeat the mutation just before it, and at the end of it. Variables declared inside a while block are not visible from the while test condition, which makes do { ... } while (...) much less useful. And so on.
Workaround:
while ({bytesRead = in.read(buffer); bytesRead != -1}) { ...
For whatever it is worth.
As an alternate explanation, perhaps Martin Odersky had to face a few very ugly bugs deriving from such usage, and decided to outlaw it from his language.
EDIT
David Pollack has answered with some actual facts, which are clearly endorsed by the fact that Martin Odersky himself commented his answer, giving credence to the performance-related issues argument put forth by Pollack.
This happened as part of Scala having a more "formally correct" type system. Formally-speaking, assignment is a purely side-effecting statement and therefore should return Unit. This does have some nice consequences; for example:
class MyBean {
private var internalState: String = _
def state = internalState
def state_=(state: String) = internalState = state
}
The state_= method returns Unit (as would be expected for a setter) precisely because assignment returns Unit.
I agree that for C-style patterns like copying a stream or similar, this particular design decision can be a bit troublesome. However, it's actually relatively unproblematic in general and really contributes to the overall consistency of the type system.
Perhaps this is due to the command-query separation principle?
CQS tends to be popular at the intersection of OO and functional programming styles, as it creates an obvious distinction between object methods that do or do not have side-effects (i.e., that alter the object). Applying CQS to variable assignments is taking it further than usual, but the same idea applies.
A short illustration of why CQS is useful: Consider a hypothetical hybrid F/OO language with a List class that has methods Sort, Append, First, and Length. In imperative OO style, one might want to write a function like this:
func foo(x):
var list = new List(4, -2, 3, 1)
list.Append(x)
list.Sort()
# list now holds a sorted, five-element list
var smallest = list.First()
return smallest + list.Length()
Whereas in more functional style, one would more likely write something like this:
func bar(x):
var list = new List(4, -2, 3, 1)
var smallest = list.Append(x).Sort().First()
# list still holds an unsorted, four-element list
return smallest + list.Length()
These seem to be trying to do the same thing, but obviously one of the two is incorrect, and without knowing more about the behavior of the methods, we can't tell which one.
Using CQS, however, we would insist that if Append and Sort alter the list, they must return the unit type, thus preventing us from creating bugs by using the second form when we shouldn't. The presence of side effects therefore also becomes implicit in the method signature.
I'd guess this is in order to keep the program / the language free of side effects.
What you describe is the intentional use of a side effect which in the general case is considered a bad thing.
It is not the best style to use an assignment as a boolean expression. You perform two things at the same time which leads often to errors. And the accidential use of "=" instead of "==" is avoided with Scalas restriction.
By the way: I find the initial while-trick stupid, even in Java. Why not somethign like this?
for(int bytesRead = in.read(buffer); bytesRead != -1; bytesRead = in.read(buffer)) {
//do something
}
Granted, the assignment appears twice, but at least bytesRead is in the scope it belongs to, and I'm not playing with funny assignment tricks...
You can have a workaround for this as long as you have a reference type for indirection. In a naïve implementation, you can use the following for arbitrary types.
case class Ref[T](var value: T) {
def := (newval: => T)(pred: T => Boolean): Boolean = {
this.value = newval
pred(this.value)
}
}
Then, under the constraint that you’ll have to use ref.value to access the reference afterwards, you can write your while predicate as
val bytesRead = Ref(0) // maybe there is a way to get rid of this line
while ((bytesRead := in.read(buffer)) (_ != -1)) { // ...
println(bytesRead.value)
}
and you can do the checking against bytesRead in a more implicit manner without having to type it.