What is a "side-effect" in Scala? - scala

I'm currently learning about functional programming using Scala.
I'm also learning about loops and how they should be avoided due to side-effects.
What does this mean?

Functions in Purely Functional Programming Languages are exactly like Functions in Mathematics: they produce a result value based on their argument values and only based on their argument values.
A Side-Effect (often just called Effect) is everything else. I.e. everything that is not reading the arguments and returning a result is a Side-Effect.
This includes, but is not limited to:
mutating state,
reading input from the console,
printing output to the console,
reading, creating, deleting, or writing to files,
reading or writing from the network,
Reflection,
depending on the current time,
launching or aborting a thread or process, or
really any kind of I/O, and most importantly
calling an impure function.
That last one is really important: calling an impure function makes a function impure. Side-Effects are infectious in this sense.
Note that saying "you are only allowed to read the arguments" is somewhat simplified. In general, we consider the environment of the function also to be kind of an "invisible" argument. That means, for example, a Closure is allowed to read variables from the environment it closes over. A function is allowed to read global variables.
Scala is an Object-Oriented Language and has Methods which have an invisible this argument, which they are allowed to read.
The important property here is called Referential Transparency. A Function or an Expression is Referentially Transparent if you can replace it with its value without changing the meaning of the program (and vice versa).
Note that generally, the terms "Pure" or "Purely Functional", "Referentially Transparent", and "Side-Effect Free" are used interchangeably.
For example, in this following program, the (sub-)expression 2 + 3 is Referentially Transparent because I can replace it with its value 5 without changing the meaning of the program:
println(2 + 3)
has exactly the same meaning as
println(5)
However, the println Method is not Referentially Transparent, because if I replace it with its value, the meaning of the program changes:
println(2 + 3)
does not have the same meaning as
()
Which is simply the value () (pronounced "unit"), which is the return value of println.
A consequence of this is that a Referentially Transparent Function always returns the same result value when passed the same arguments. For all code, you should get the same output for the same input. Or more generally, if you do the same thing over and over, the same result should happen over and over.
And that's where the connection between loops and Side-Effects lies: a loop is doing the same thing over and over. So, it should have the same result over and over. But it doesn't: it will have a different result at least once, namely it will finish. (Unless it is an infinite loop.)
In order for loops to make sense, they must have Side-Effects. However, a Purely Functional program must not have Side-Effects. Therefore, loops cannot possibly make sense in a Purely Functional program.

As an additional example from #Jörg, take this naive loop using imperative language written in Scala:
def printUpTo(limit: Int): Unit = {
  var i = 0
  while(i <= limit)
  {
    println("i = " + i)
    i += 1
    // in another part of the loop
    if (i % 5 == 0) { i += 1 } // ops. We should not evaluate "i" here.
  }
}
Inside this loop there is a variable declared as var i which is a state that is changed on each iteration. While this state change isn't visible from the outside (a new copy of the variable is created each time the function is entered), var often means there's unnecessary clutter in the code and that it can be simplified. And indeed it can.
As a functional programmer we must strive to use immutable states everywhere. In this loop example if someone changes the value of var i in another place, like in if (i % 5 == 0) { i += 1 } by lack of attention, it will be hard to debug and find. This is a side effect that we must avoid. So, using immutable states avoids these kind of bugs. Here is the same example using imutable state.
def printUpToFunc1(limit: Int): Unit = {
  for(i <- (0 to limit)) {
    println("i = " + i)
  }
}
And we can make the code more clear using only foreach:
def printUpToFunc2(limit: Int): Unit = {
  (0 to limit).foreach {
    i => println("i = " + i)
  }
}
And smaller...
def printUpToFunc3(limit: Int): Unit = (0 to limit).foreach(println)

All those are good answers. Just add one quick point if you come from another language.
Void Function
A function returns nothing, such as void, implies there is a side effect.
For example, if you have this code in c#
void Log (string message) => Logger.WriteLine(message);
This causes side effect, which writes something to a logger.
Does it matter? Probably you don't care. However, what about this?
def SubmitOrder(order: Order): Unit =
{
// code that submits an order
}
This will not be good. See later.
Why side effect is bad?
Other than some obvious reasons, including:
hard to reason: must read the entire function body to see what is happening;
mutable state: error prone and potentially not thread-safe
The most important thing is, it is annoying to test.
How to avoid side effect?
An easy way is always try to return something. (Of course, still try not to change the state internally, closure is fine).
For example, the previous example, if instead of Unit, we have:
def SubmitOrder(order: Order): Either[SubmittedOrder, OrderSubmissionError] =
{
// code that submits an order
}
this will be much better, it tells the reader there is a side effect and what might happen.
Side Effect in Loop
Now go back to your question about loop, without analyzing your real case, it is hard to suggest how to avoid side effect from a loop.
However, if you are writing a function and then you want to write a loop that calls the function, make sure that function does not modify a local variable or a state in somewhere else.

Related

Understanding the continuation theorem in Scala

So, I was trying to learn about Continuation. I came across with the following saying (link):
Say you're in the kitchen in front of the refrigerator, thinking about a sandwich. You take a continuation right there and stick it in your pocket. Then you get some turkey and bread out of the refrigerator and make yourself a sandwich, which is now sitting on the counter. You invoke the continuation in your pocket, and you find yourself standing in front of the refrigerator again, thinking about a sandwich. But fortunately, there's a sandwich on the counter, and all the materials used to make it are gone. So you eat it. :-) — Luke Palmer
Also, I saw a program in Scala:
var k1 : (Unit => Sandwich) = null
reset {
shift { k : Unit => Sandwich) => k1 = k }
makeSandwich
}
val x = k1()
I don't really know the syntax of Scala (looks similar to Java and C mixed together) but I would like to understand the concept of Continuation.
Firstly, I tried to run this program (by adding it into main). But it fails, I think that it has a syntax error due to the ) near Sandwich but I'm not sure. I removed it but it still does not compile.
How to create a fully compiled example that shows the concept of the story above?
How this example shows the concept of Continuation.
In the link above there was the following saying: "Not a perfect analogy in Scala because makeSandwich is not executed the first time through (unlike in Scheme)". What does it mean?
Since you seem to be more interested in the concept of the "continuation" rather than specific code, let's forget about that code for a moment (especially because it is quite old and I don't really like those examples because IMHO you can't understand them correctly unless you already know what a continuation is).
Note: this is a very long answer with some attempts to describe what a continuations is and why it is useful. There are some examples in Scala-like pseudo-code none of which can actually be compiled and run (there is just one compilable example at the very end and it references another example from the middle of the answer). Expect to spend a significant amount of time just reading this answer.
Intro to continuations
Probably the first thing you should do to understand a continuation is to forget about how modern compilers for most of the imperative languages work and how most of the modern CPUs work and particularly the idea of the call stack. This is actually implementation details (although quite popular and quite useful in practice).
Assume you have a CPU that can execute some sequence of instructions. Now you want to have a high level languages that support the idea of methods that can call each other. The obvious problem you face is that the CPU needs some "forward only" sequence of commands but you want some way to "return" results from a sub-program to the caller. Conceptually it means that you need to have some way to store somewhere before the call all the state of the caller method that is required for it to continue to run after the result of the sub-program is computed, pass it to the sub-program and then ask the sub-program at the end to continue execution from that stored state. This stored state is exactly a continuation. In most of the modern environments those continuations are stored on the call stack and often there are some assembly instructions specifically designed to help handling it (like call and return). But again this is just implementation details. Potentially they might be stored in an arbitrary way and it will still work.
So now let's re-iterate this idea: a continuation is a state of the program at some point that is enough to continue its execution from that point, typically with no additional input or some small known input (like a return value of the called method). Running a continuation is different from a method call in that usually continuation never explicitly returns execution control back to the caller, it can only pass it to another continuation. Potentially you can create such a state yourself, but in practice for the feature to be useful you need some support from the compiler to build continuations automatically or emulate it in some other way (this is why the Scala code you see requires a compiler plugin).
Asynchronous calls
Now there is an obvious question: why continuations are useful at all? Actually there are a few more scenarios besides the simple "return" case. One such scenario is asynchronous programming. Actually if you do some asynchronous call and provide a callback to handle the result, this can be seen as passing a continuation. Unfortunately most of the modern languages do not support automatic continuations so you have to grab all the relevant state yourself. Another problem appears if you have some logic that needs a sequence of many async calls. And if some of the calls are conditional, you easily get to the callbacks hell. The way continuations help you avoid it is by allowing you build a method with effectively inverted control flow. With typical call it is the caller that knows the callee and expects to get a result back in a synchronous way. With continuations you can write a method with several "entry points" (or "return to points") for different stages of the processing logic that you can just pass to some other method and that method can still return to exactly that position.
Consider following example (in pseudo-code that is Scala-like but is actually far from the real Scala in many details):
def someBusinessLogic() = {
val userInput = getIntFromUser()
val firstServiceRes = requestService1(userInput)
val secondServiceRes = if (firstServiceRes % 2 == 0) requestService2v1(userInput) else requestService2v2(userInput)
showToUser(combineUserInputAndResults(userInput,secondServiceRes))
}
If all those calls a synchronous blocking calls, this code is easy. But assume all those get and request calls are asynchronous. How to re-write the code? The moment you put the logic in callbacks you loose the clarity of the sequential code. And here is where continuations might help you:
def someBusinessLogicCont() = {
// the method entry point
val userInput
getIntFromUserAsync(cont1, captureContinuationExpecting(entry1, userInput))
// entry/return point after user input
entry1:
val firstServiceRes
requestService1Async(userInput, captureContinuationExpecting(entry2, firstServiceRes))
// entry/return point after the first request to the service
entry2:
val secondServiceRes
if (firstServiceRes % 2 == 0) {
requestService2v1Async(userInput, captureContinuationExpecting(entry3, secondServiceRes))
// entry/return point after the second request to the service v1
entry3:
} else {
requestService2v2Async(userInput, captureContinuationExpecting(entry4, secondServiceRes))
// entry/return point after the second request to the service v2
entry4:
}
showToUser(combineUserInputAndResults(userInput, secondServiceRes))
}
It is hard to capture the idea in a pseudo-code. What I mean is that all those Async method never return. The only way to continue execution of the someBusinessLogicCont is to call the continuation passed into the "async" method. The captureContinuationExpecting(label, variable) call is supposed to create a continuation of the current method at the label with the input (return) value bound to the variable. With such a re-write you still has a sequential-looking business logic even with all those asynchronous calls. So now for a getIntFromUserAsync the second argument looks like just another asynchronous (i.e. never-returning) method that just requires one integer argument. Let's call this type Continuation[T]
trait Continuation[T] {
def continue(value: T):Nothing
}
Logically Continuation[T] looks like a function T => Unit or rather T => Nothing where Nothing as the return type signifies that the call actually never returns (note, in actual Scala implementation such calls do return, so no Nothing there, but I think conceptually it is easy to think about no-return continuations).
Internal vs external iteration
Another example is a problem of iteration. Iteration can be internal or external. Internal iteration API looks like this:
trait CollectionI[T] {
def forEachInternal(handler: T => Unit): Unit
}
External iteration looks like this:
trait Iterator[T] {
def nextValue(): Option[T]
}
trait CollectionE[T] {
def forEachExternal(): Iterator[T]
}
Note: often Iterator has two method like hasNext and nextValue returning T but it will just make the story a bit more complicated. Here I use a merged nextValue returning Option[T] where the value None means the end of the iteration and Some(value) means the next value.
Assuming the Collection is implemented by something more complicated than an array or a simple list, for example some kind of a tree, there is a conflict here between the implementer of the API and the API user if you use typical imperative language. And the conflict is over the simple question: who controls the stack (i.e. the easy to use state of the program)? The internal iteration is easier for the implementer because he controls the stack and can easily store whatever state is needed to move to the next item but for the API user the things become tricky if she wants to do some aggregation of the stored data because now she has to save the state between the calls to the handler somewhere. Also you need some additional tricks to let the user stop the iteration at some arbitrary place before the end of the data (consider you are trying to implement find via forEach). Conversely the external iteration is easy for the user: she can store all the state necessary to process data in any way in local variables but the API implementer now has to store his state between calls to the nextValue somewhere else. So fundamentally the problem arises because there is only one place to easily store the state of "your" part of the program (the call stack) and two conflicting users for that place. It would be nice if you could just have two different independent places for the state: one for the implementer and another for the user. And continuations provide exactly that. The idea is that we can pass execution control between two methods back and forth using two continuations (one for each part of the program). Let's change the signatures to:
// internal iteration
// continuation of the iterator
type ContIterI[T] = Continuation[(ContCallerI[T], ContCallerLastI)]
// continuation of the caller
type ContCallerI[T] = Continuation[(T, ContIterI[T])]
// last continuation of the caller
type ContCallerLastI = Continuation[Unit]
// external iteration
// continuation of the iterator
type ContIterE[T] = Continuation[ContCallerE[T]]
// continuation of the caller
type ContCallerE[T] = Continuation[(Option[T], ContIterE[T])]
trait Iterator[T] {
def nextValue(cont : ContCallerE[T]): Nothing
}
trait CollectionE[T] {
def forEachExternal(): Iterator[T]
}
trait CollectionI[T] {
def forEachInternal(cont : ContCallerI[T]): Nothing
}
Here ContCallerI[T] type, for example, means that this is a continuation (i.e. a state of the program) the expects two input parameters to continue running: one of type T (the next element) and another of type ContIterI[T] (the continuation to switch back). Now you can see that the new forEachInternal and the new forEachExternal+Iterator have almost the same signatures. The only difference in how the end of the iteration is signaled: in one case it is done by returning None and in other by passing and calling another continuation (ContCallerLastI).
Here is a naive pseudo-code implementation of a sum of elements in an array of Int using these signatures (an array is used instead of something more complicated to simplify the example):
class ArrayCollection[T](val data:T[]) : CollectionI[T] {
def forEachInternal(cont0 : ContCallerI[T], lastCont: ContCallerLastI): Nothing = {
var contCaller = cont0
for(i <- 0 to data.length) {
val contIter = captureContinuationExpecting(label, contCaller)
contCaller.continue(data(i), contIter)
label:
}
}
}
def sum(arr: ArrayCollection[Int]): Int = {
var sum = 0
val elem:Int
val iterCont:ContIterI[Int]
val contAdd0 = captureContinuationExpecting(labelAdd, elem, iterCont)
val contLast = captureContinuation(labelReturn)
arr.forEachInternal(contAdd0, contLast)
labelAdd:
sum += elem
val contAdd = captureContinuationExpecting(labelAdd, elem, iterCont)
iterCont.continue(contAdd)
// note that the code never execute this line, the only way to jump out of labelAdd is to call contLast
labelReturn:
return sum
}
Note how both implementations of the forEachInternal and of the sum methods look fairly sequential.
Multi-tasking
Cooperative multitasking also known as coroutines is actually very similar to the iterations example. Cooperative multitasking is an idea that the program can voluntarily give up ("yield") its execution control either to the global scheduler or to another known coroutine. Actually the last (re-written) example of sum can be seen as two coroutines working together: one doing iteration and another doing summation. But more generally your code might yield its execution to some scheduler that then will select which other coroutine to run next. And what the scheduler does is manages a bunch of continuations deciding which to continue next.
Preemptive multitasking can be seen as a similar thing but the scheduler is run by some hardware interruption and then the scheduler needs a way to create a continuation of the program being executed just before the interruption from the outside of that program rather than from the inside.
Scala examples
What you see is a really old article that is referring to Scala 2.8 (while current versions are 2.11, 2.12, and soon 2.13). As #igorpcholkin correctly pointed out, you need to use a Scala continuations compiler plugin and library. The sbt compiler plugin page has an example how to enable exactly that plugin (for Scala 2.12 and #igorpcholkin's answer has the magic strings for Scala 2.11):
val continuationsVersion = "1.0.3"
autoCompilerPlugins := true
addCompilerPlugin("org.scala-lang.plugins" % "scala-continuations-plugin_2.12.2" % continuationsVersion)
libraryDependencies += "org.scala-lang.plugins" %% "scala-continuations-library" % continuationsVersion
scalacOptions += "-P:continuations:enable"
The problem is that plugin is semi-abandoned and is not widely used in practice. Also the syntax has changed since the Scala 2.8 times so it is hard to get those examples running even if you fix the obvious syntax bugs like missing ( here and there. The reason of that state is stated on the GitHub as:
You may also be interested in https://github.com/scala/async, which covers the most common use case for the continuations plugin.
What that plugin does is emulates continuations using code-rewriting (I suppose it is really hard to implement true continuations over the JVM execution model). And under such re-writings a natural thing to represent a continuation is some function (typically called k and k1 in those examples).
So now if you managed to read the wall of text above, you can probably interpret the sandwich example correctly. AFAIU that example is an example of using continuation as means to emulate "return". If we re-sate it with more details, it could go like this:
You (your brain) are inside some function that at some points decides that it wants a sandwich. Luckily you have a sub-routine that knows how to make a sandwich. You store your current brain state as a continuation into the pocket and call the sub-routine saying to it that when the job is done, it should continue the continuation from the pocket. Then you make a sandwich according to that sub-routine messing up with your previous brain state. At the end of the sub-routine it runs the continuation from the pocket and you return to the state just before the call of the sub-routine, forget all your state during that sub-routine (i.e. how you made the sandwich) but you can see the changes in the outside world i.e. that the sandwich is made now.
To provide at least one compilable example with the current version of the scala-continuations, here is a simplified version of my asynchronous example:
case class RemoteService(private val readData: Array[Int]) {
private var readPos = -1
def asyncRead(callback: Int => Unit): Unit = {
readPos += 1
callback(readData(readPos))
}
}
def readAsyncUsage(rs1: RemoteService, rs2: RemoteService): Unit = {
import scala.util.continuations._
reset {
val read1 = shift(rs1.asyncRead)
val read2 = if (read1 % 2 == 0) shift(rs1.asyncRead) else shift(rs2.asyncRead)
println(s"read1 = $read1, read2 = $read2")
}
}
def readExample(): Unit = {
// this prints 1-42
readAsyncUsage(RemoteService(Array(1, 2)), RemoteService(Array(42)))
// this prints 2-1
readAsyncUsage(RemoteService(Array(2, 1)), RemoteService(Array(42)))
}
Here remote calls are emulated (mocked) with a fixed data provided in arrays. Note how readAsyncUsage looks like a totally sequential code despite the non-trivial logic of which remote service to call in the second read depending on the result of the first read.
For full example you need prepare Scala compiler to use continuations and also use a special compiler plugin and library.
The simplest way is a create a new sbt.project in IntellijIDEA with the following files: build.sbt - in the root of the project, CTest.scala - inside main/src.
Here is contents of both files:
build.sbt:
name := "ContinuationSandwich"
version := "0.1"
scalaVersion := "2.11.6"
autoCompilerPlugins := true
addCompilerPlugin(
"org.scala-lang.plugins" % "scala-continuations-plugin_2.11.6" % "1.0.2")
libraryDependencies +=
"org.scala-lang.plugins" %% "scala-continuations-library" % "1.0.2"
scalacOptions += "-P:continuations:enable"
CTest.scala:
import scala.util.continuations._
object CTest extends App {
case class Sandwich()
def makeSandwich = {
println("Making sandwich")
new Sandwich
}
var k1 : (Unit => Sandwich) = null
reset {
shift { k : (Unit => Sandwich) => k1 = k }
makeSandwich
}
val x = k1()
}
What the code above essentially does is calling makeSandwich function (in a convoluted manner). So execution result would be just printing "Making sandwich" into console. The same result would be achieved without continuations:
object CTest extends App {
case class Sandwich()
def makeSandwich = {
println("Making sandwich")
new Sandwich
}
val x = makeSandwich
}
So what's the point? My understanding is that we want to "prepare a sandwich", ignoring the fact that we may be not ready for that. We mark a point of time where we want to return to after all necessary conditions are met (i.e. we have all necessary ingredients ready). After we fetch all ingredients we can return to the mark and "prepare a sandwich", "forgetting that we were unable to do that in past". Continuations allow us to "mark point of time in past" and return to that point.
Now step by step. k1 is a variable to hold a pointer to a function which should allow to "create sandwich". We know it because k1 is declared so: (Unit => Sandwich).
However initially the variable is not initialized (k1 = null, "there are no ingredients to make a sandwich yet"). So we can't call the function preparing sandwich using that variable yet.
So we mark a point of execution where we want to return to (or point of time in past we want to return to) using "reset" statement.
makeSandwich is another pointer to a function which actually allows to make a sandwich. It's the last statement of "reset block" and hence it is passed to "shift" (function) as argument (shift { k : (Unit => Sandwich).... Inside shift we assign that argument to k1 variable k1 = k thus making k1 ready to be called as a function. After that we return to execution point marked by reset. The next statement is execution of function pointed to by k1 variable which is now properly initialized so finally we call makeSandwich which prints "Making sandwich" to a console. It also returns an instance of sandwich class which is assigned to x variable.
Not sure, probably it means that makeSandwich is not called inside reset block but just afterwards when we call it as k1().

Is there a reason why assignments in scala evaluate to Unit? [duplicate]

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?
A common pattern in I/O programming is to do things like this:
while ((bytesRead = in.read(buffer)) != -1) { ...
But this is not possible in Scala because...
bytesRead = in.read(buffer)
.. returns Unit, not the new value of bytesRead.
Seems like an interesting thing to leave out of a functional language.
I am wondering why it was done so?
I advocated for having assignments return the value assigned rather than unit. Martin and I went back and forth on it, but his argument was that putting a value on the stack just to pop it off 95% of the time was a waste of byte-codes and have a negative impact on performance.
I'm not privy to inside information on the actual reasons, but my suspicion is very simple. Scala makes side-effectful loops awkward to use so that programmers will naturally prefer for-comprehensions.
It does this in many ways. For instance, you don't have a for loop where you declare and mutate a variable. You can't (easily) mutate state on a while loop at the same time you test the condition, which means you often have to repeat the mutation just before it, and at the end of it. Variables declared inside a while block are not visible from the while test condition, which makes do { ... } while (...) much less useful. And so on.
Workaround:
while ({bytesRead = in.read(buffer); bytesRead != -1}) { ...
For whatever it is worth.
As an alternate explanation, perhaps Martin Odersky had to face a few very ugly bugs deriving from such usage, and decided to outlaw it from his language.
EDIT
David Pollack has answered with some actual facts, which are clearly endorsed by the fact that Martin Odersky himself commented his answer, giving credence to the performance-related issues argument put forth by Pollack.
This happened as part of Scala having a more "formally correct" type system. Formally-speaking, assignment is a purely side-effecting statement and therefore should return Unit. This does have some nice consequences; for example:
class MyBean {
private var internalState: String = _
def state = internalState
def state_=(state: String) = internalState = state
}
The state_= method returns Unit (as would be expected for a setter) precisely because assignment returns Unit.
I agree that for C-style patterns like copying a stream or similar, this particular design decision can be a bit troublesome. However, it's actually relatively unproblematic in general and really contributes to the overall consistency of the type system.
Perhaps this is due to the command-query separation principle?
CQS tends to be popular at the intersection of OO and functional programming styles, as it creates an obvious distinction between object methods that do or do not have side-effects (i.e., that alter the object). Applying CQS to variable assignments is taking it further than usual, but the same idea applies.
A short illustration of why CQS is useful: Consider a hypothetical hybrid F/OO language with a List class that has methods Sort, Append, First, and Length. In imperative OO style, one might want to write a function like this:
func foo(x):
var list = new List(4, -2, 3, 1)
list.Append(x)
list.Sort()
# list now holds a sorted, five-element list
var smallest = list.First()
return smallest + list.Length()
Whereas in more functional style, one would more likely write something like this:
func bar(x):
var list = new List(4, -2, 3, 1)
var smallest = list.Append(x).Sort().First()
# list still holds an unsorted, four-element list
return smallest + list.Length()
These seem to be trying to do the same thing, but obviously one of the two is incorrect, and without knowing more about the behavior of the methods, we can't tell which one.
Using CQS, however, we would insist that if Append and Sort alter the list, they must return the unit type, thus preventing us from creating bugs by using the second form when we shouldn't. The presence of side effects therefore also becomes implicit in the method signature.
I'd guess this is in order to keep the program / the language free of side effects.
What you describe is the intentional use of a side effect which in the general case is considered a bad thing.
It is not the best style to use an assignment as a boolean expression. You perform two things at the same time which leads often to errors. And the accidential use of "=" instead of "==" is avoided with Scalas restriction.
By the way: I find the initial while-trick stupid, even in Java. Why not somethign like this?
for(int bytesRead = in.read(buffer); bytesRead != -1; bytesRead = in.read(buffer)) {
//do something
}
Granted, the assignment appears twice, but at least bytesRead is in the scope it belongs to, and I'm not playing with funny assignment tricks...
You can have a workaround for this as long as you have a reference type for indirection. In a naïve implementation, you can use the following for arbitrary types.
case class Ref[T](var value: T) {
def := (newval: => T)(pred: T => Boolean): Boolean = {
this.value = newval
pred(this.value)
}
}
Then, under the constraint that you’ll have to use ref.value to access the reference afterwards, you can write your while predicate as
val bytesRead = Ref(0) // maybe there is a way to get rid of this line
while ((bytesRead := in.read(buffer)) (_ != -1)) { // ...
println(bytesRead.value)
}
and you can do the checking against bytesRead in a more implicit manner without having to type it.

Relations between different functional programming notions

I understand different notions of functional programming by itself: side effects, immutability, pure functions, referential transparency. But I can't connect them together in my head. For example, I have the following questions:
What is the relation between ref. transparency and immutability. Does one implies the other?
Sometimes side effects and immutability is used interchangeably. Is it correct?
This question requires some especially nit-picky answers, since it's about defining common vocabulary.
First, a function is a kind of mathematical relation between a "domain" of inputs and a "range" (or codomain) of outputs. Every input produces an unambiguous output. For example, the integer addition function + accepts input in the domain Int x Int and produces outputs in the range Int.
object Ex0 {
def +(x: Int, y: Int): Int = x + y
}
Given any values for x and y, clearly + will always produce the same result. This is a function. If the compiler were extra clever, it could insert code to cache the results of this function for each pair of inputs, and perform a cache lookup as an optimization. That's clearly safe here.
The problem is that in software, the term "function" has been somewhat abused: although functions accept arguments and return values as declared in their signature, they can also read and write to some external context. For example:
class Ex1 {
def +(x: Int): Int = x + Random.nextInt
}
We can't think of this as a mathematical function anymore, because for a given value of x, + can produce different results (depending on a random value, which doesn't appear anywhere in +'s signature). The result of + can not be safely cached as described above. So now we have a vocabulary problem, which we solve by saying that Ex0.+ is pure, and Ex1.+ isn't.
Okay, since we've now accepted some degree of impurity, we need to define what kind of impurity we're talking about! In this case, we've said the difference is that we can cache Ex0.+'s results associated with its inputs x and y, and that we can't cache Ex1.+'s results associated with its input x. The term we use to describe cacheability (or, more properly, substitutability of a function call with its output) is referential transparency.
All pure functions are referentially transparent, but some referentially transparent functions aren't pure. For example:
object Ex2 {
var lastResult: Int
def +(x: Int, y: Int): Int = {
lastResult = x + y
lastResult
}
}
Here we're not reading from any external context, and the value produced by Ex2.+ for any inputs x and y will always be cacheable, as in Ex0. This is referentially transparent, but it does have a side effect, which is to store the last value computed by the function. Somebody else can come along later and grab lastResult, which will give them some sneaky insight into what's been happening with Ex2.+!
A side note: you can also argue that Ex2.+ is not referentially transparent, because although caching is safe with respect to the function's result, the side effect is silently ignored in the case of a cache "hit." In other words, introducing a cache changes the program's meaning, if the side effect is important (hence Norman Ramsey's comment)! If you prefer this definition, then a function must be pure in order to be referentially transparent.
Now, one thing to observe here is that if we call Ex2.+ twice or more in a row with the same inputs, lastResult will not change. The side effect of invoking the method n times is equivalent to the side effect of invoking the method only once, and so we say that Ex2.+ is idempotent. We could change it:
object Ex3 {
var history: Seq[Int]
def +(x: Int, y: Int): Int = {
result = x + y
history = history :+ result
result
}
}
Now, each time we call Ex3.+, the history changes, so the function is no longer idempotent.
Okay, a recap so far: a pure function is one that neither reads from nor writes to any external context. It is both referentially transparent and side effect free. A function that reads from some external context is no longer referentially transparent, whereas a function that writes to some external context is no longer free of side effects. And finally, a function which when called multiple times with the same input has the same side effect as calling it only once, is called idempotent. Note that a function with no side effects, such as a pure function, is also idempotent!
So how does mutability and immutability play into all this? Well, look back at Ex2 and Ex3. They introduce mutable vars. The side effects of Ex2.+ and Ex3.+ is to mutate their respective vars! So mutability and side effects go hand-in-hand; a function that operates only on immutable data must be side effect free. It might still not be pure (that is, it might not be referentially transparent), but at least it will not produce side effects.
A logical follow-up question to this might be: "what are the benefits of a pure functional style?" The answer to that question is more involved ;)
"No" to the first - one implies the other, but not the converse, and a qualified "Yes" to the second.
"An expression is said to be referentially transparent if it can be replaced with its value without changing the behavior of a program".
Immutable input suggests that an expression (function) will always evaluate to the same value, and therefore be referentially transparent.
However, (mergeconflict has kindly correct me on this point) being referentially transparent does not necessarily require immutability.
By definition, a side-effect is an aspect of a function; meaning that when you call a function, it changes something.
Immutability is an aspect of data; it cannot be changed.
Calling a function on such does imply that there can be no side-effects. (in Scala, that's limited to "no changes to the immutable object(s)" - the developer has responsibilities and decisions).
While side-effect and immutability don't mean the same thing, they are closely related aspects of a function and the data the function is applied to.
Since Scala is not a pure functional programming language, one must be careful when considering the meaning of such statements as "immutable input" - the scope of the input to a function may include elements other than those passed as parameters. Similarly for considering side-effects.
It rather depends on the specific definitions you use (there can be disagreement, see for example Purity vs Referential transparency), but I think this is a reasonable interpretation:
Referential transparency and 'purity' are properties of functions/expressions. A function/expression may or may not have side-effects. Immutability, on the other hand, is a property of objects, not functions/expressions.
Referential transparency, side-effects and purity are closely related: 'pure' and 'referentially transparent' are equivalent, and those notions are equivalent to the absence of side-effects.
An immutable object may have methods which are not referentially transparent: those methods will not change the object itself (as that would make the object mutable), but may have other side-effects such as performing I/O or manipulating their (mutable) parameters.

Why does assignment return Unit? (Why aren't assignments chainable?) [duplicate]

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?
A common pattern in I/O programming is to do things like this:
while ((bytesRead = in.read(buffer)) != -1) { ...
But this is not possible in Scala because...
bytesRead = in.read(buffer)
.. returns Unit, not the new value of bytesRead.
Seems like an interesting thing to leave out of a functional language.
I am wondering why it was done so?
I advocated for having assignments return the value assigned rather than unit. Martin and I went back and forth on it, but his argument was that putting a value on the stack just to pop it off 95% of the time was a waste of byte-codes and have a negative impact on performance.
I'm not privy to inside information on the actual reasons, but my suspicion is very simple. Scala makes side-effectful loops awkward to use so that programmers will naturally prefer for-comprehensions.
It does this in many ways. For instance, you don't have a for loop where you declare and mutate a variable. You can't (easily) mutate state on a while loop at the same time you test the condition, which means you often have to repeat the mutation just before it, and at the end of it. Variables declared inside a while block are not visible from the while test condition, which makes do { ... } while (...) much less useful. And so on.
Workaround:
while ({bytesRead = in.read(buffer); bytesRead != -1}) { ...
For whatever it is worth.
As an alternate explanation, perhaps Martin Odersky had to face a few very ugly bugs deriving from such usage, and decided to outlaw it from his language.
EDIT
David Pollack has answered with some actual facts, which are clearly endorsed by the fact that Martin Odersky himself commented his answer, giving credence to the performance-related issues argument put forth by Pollack.
This happened as part of Scala having a more "formally correct" type system. Formally-speaking, assignment is a purely side-effecting statement and therefore should return Unit. This does have some nice consequences; for example:
class MyBean {
private var internalState: String = _
def state = internalState
def state_=(state: String) = internalState = state
}
The state_= method returns Unit (as would be expected for a setter) precisely because assignment returns Unit.
I agree that for C-style patterns like copying a stream or similar, this particular design decision can be a bit troublesome. However, it's actually relatively unproblematic in general and really contributes to the overall consistency of the type system.
Perhaps this is due to the command-query separation principle?
CQS tends to be popular at the intersection of OO and functional programming styles, as it creates an obvious distinction between object methods that do or do not have side-effects (i.e., that alter the object). Applying CQS to variable assignments is taking it further than usual, but the same idea applies.
A short illustration of why CQS is useful: Consider a hypothetical hybrid F/OO language with a List class that has methods Sort, Append, First, and Length. In imperative OO style, one might want to write a function like this:
func foo(x):
var list = new List(4, -2, 3, 1)
list.Append(x)
list.Sort()
# list now holds a sorted, five-element list
var smallest = list.First()
return smallest + list.Length()
Whereas in more functional style, one would more likely write something like this:
func bar(x):
var list = new List(4, -2, 3, 1)
var smallest = list.Append(x).Sort().First()
# list still holds an unsorted, four-element list
return smallest + list.Length()
These seem to be trying to do the same thing, but obviously one of the two is incorrect, and without knowing more about the behavior of the methods, we can't tell which one.
Using CQS, however, we would insist that if Append and Sort alter the list, they must return the unit type, thus preventing us from creating bugs by using the second form when we shouldn't. The presence of side effects therefore also becomes implicit in the method signature.
I'd guess this is in order to keep the program / the language free of side effects.
What you describe is the intentional use of a side effect which in the general case is considered a bad thing.
It is not the best style to use an assignment as a boolean expression. You perform two things at the same time which leads often to errors. And the accidential use of "=" instead of "==" is avoided with Scalas restriction.
By the way: I find the initial while-trick stupid, even in Java. Why not somethign like this?
for(int bytesRead = in.read(buffer); bytesRead != -1; bytesRead = in.read(buffer)) {
//do something
}
Granted, the assignment appears twice, but at least bytesRead is in the scope it belongs to, and I'm not playing with funny assignment tricks...
You can have a workaround for this as long as you have a reference type for indirection. In a naïve implementation, you can use the following for arbitrary types.
case class Ref[T](var value: T) {
def := (newval: => T)(pred: T => Boolean): Boolean = {
this.value = newval
pred(this.value)
}
}
Then, under the constraint that you’ll have to use ref.value to access the reference afterwards, you can write your while predicate as
val bytesRead = Ref(0) // maybe there is a way to get rid of this line
while ((bytesRead := in.read(buffer)) (_ != -1)) { // ...
println(bytesRead.value)
}
and you can do the checking against bytesRead in a more implicit manner without having to type it.

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?

What is the motivation for Scala assignment evaluating to Unit rather than the value assigned?
A common pattern in I/O programming is to do things like this:
while ((bytesRead = in.read(buffer)) != -1) { ...
But this is not possible in Scala because...
bytesRead = in.read(buffer)
.. returns Unit, not the new value of bytesRead.
Seems like an interesting thing to leave out of a functional language.
I am wondering why it was done so?
I advocated for having assignments return the value assigned rather than unit. Martin and I went back and forth on it, but his argument was that putting a value on the stack just to pop it off 95% of the time was a waste of byte-codes and have a negative impact on performance.
I'm not privy to inside information on the actual reasons, but my suspicion is very simple. Scala makes side-effectful loops awkward to use so that programmers will naturally prefer for-comprehensions.
It does this in many ways. For instance, you don't have a for loop where you declare and mutate a variable. You can't (easily) mutate state on a while loop at the same time you test the condition, which means you often have to repeat the mutation just before it, and at the end of it. Variables declared inside a while block are not visible from the while test condition, which makes do { ... } while (...) much less useful. And so on.
Workaround:
while ({bytesRead = in.read(buffer); bytesRead != -1}) { ...
For whatever it is worth.
As an alternate explanation, perhaps Martin Odersky had to face a few very ugly bugs deriving from such usage, and decided to outlaw it from his language.
EDIT
David Pollack has answered with some actual facts, which are clearly endorsed by the fact that Martin Odersky himself commented his answer, giving credence to the performance-related issues argument put forth by Pollack.
This happened as part of Scala having a more "formally correct" type system. Formally-speaking, assignment is a purely side-effecting statement and therefore should return Unit. This does have some nice consequences; for example:
class MyBean {
private var internalState: String = _
def state = internalState
def state_=(state: String) = internalState = state
}
The state_= method returns Unit (as would be expected for a setter) precisely because assignment returns Unit.
I agree that for C-style patterns like copying a stream or similar, this particular design decision can be a bit troublesome. However, it's actually relatively unproblematic in general and really contributes to the overall consistency of the type system.
Perhaps this is due to the command-query separation principle?
CQS tends to be popular at the intersection of OO and functional programming styles, as it creates an obvious distinction between object methods that do or do not have side-effects (i.e., that alter the object). Applying CQS to variable assignments is taking it further than usual, but the same idea applies.
A short illustration of why CQS is useful: Consider a hypothetical hybrid F/OO language with a List class that has methods Sort, Append, First, and Length. In imperative OO style, one might want to write a function like this:
func foo(x):
var list = new List(4, -2, 3, 1)
list.Append(x)
list.Sort()
# list now holds a sorted, five-element list
var smallest = list.First()
return smallest + list.Length()
Whereas in more functional style, one would more likely write something like this:
func bar(x):
var list = new List(4, -2, 3, 1)
var smallest = list.Append(x).Sort().First()
# list still holds an unsorted, four-element list
return smallest + list.Length()
These seem to be trying to do the same thing, but obviously one of the two is incorrect, and without knowing more about the behavior of the methods, we can't tell which one.
Using CQS, however, we would insist that if Append and Sort alter the list, they must return the unit type, thus preventing us from creating bugs by using the second form when we shouldn't. The presence of side effects therefore also becomes implicit in the method signature.
I'd guess this is in order to keep the program / the language free of side effects.
What you describe is the intentional use of a side effect which in the general case is considered a bad thing.
It is not the best style to use an assignment as a boolean expression. You perform two things at the same time which leads often to errors. And the accidential use of "=" instead of "==" is avoided with Scalas restriction.
By the way: I find the initial while-trick stupid, even in Java. Why not somethign like this?
for(int bytesRead = in.read(buffer); bytesRead != -1; bytesRead = in.read(buffer)) {
//do something
}
Granted, the assignment appears twice, but at least bytesRead is in the scope it belongs to, and I'm not playing with funny assignment tricks...
You can have a workaround for this as long as you have a reference type for indirection. In a naïve implementation, you can use the following for arbitrary types.
case class Ref[T](var value: T) {
def := (newval: => T)(pred: T => Boolean): Boolean = {
this.value = newval
pred(this.value)
}
}
Then, under the constraint that you’ll have to use ref.value to access the reference afterwards, you can write your while predicate as
val bytesRead = Ref(0) // maybe there is a way to get rid of this line
while ((bytesRead := in.read(buffer)) (_ != -1)) { // ...
println(bytesRead.value)
}
and you can do the checking against bytesRead in a more implicit manner without having to type it.