Folding lists in scala - scala

Folding list in scala using /: and :\ operator
I tried to to look at different sites and they only talk about foldRight and foldLeft functions.
def sum(xs: List[Int]): Int = (0 /: xs) (_ + _)
sum(List(1,2,3))
res0: 6
The code segment works as described. But I am not able to completely understand the method definition. What I understand is that the one inside the first parenthesis -> 0 /: xs where /: is a right associate operator. The object is xs and the parameter is 0. I am not sure about the return type of the operation (most probably it would be another list?). The second part is a functional piece which sums its two parameters. But I don't understand what object invokes it ? and the name of function. Can someone please help me to understand.

The signature of :/ is
/:[B](z: B)(op: (B, A) ⇒ B): B
It is a method with multiple argument lists, so when it is just invoked with on argument (i.e. 0 /: xs in your case) the return type is (op: (B, A) ⇒ B): B. So you have to pass it a method with 2 parameters ( _ + _ ) that is used to combine the elements of the list starting from z.
This method is usually called foldLeft:
(0 /: xs)(_ + _) is the same as xs.foldLeft(0)(_ + _)
You can find more details here: https://www.scala-lang.org/api/2.12.3/scala/collection/immutable/List.html

Thanks #HaraldGliebe & #LuisMiguelMejíaSuárez for your great responses. I am enlightened now!. I am just summarisig the answer here which may benefit others who read this thread.
"/:" is actually the name of the function which is defined inside the List class. The signature of the function is: /:[B](z: B)(op: (B, A) ⇒ B): B --> where B is the type parameter, z is the first parameter; op is the second parameter which is of functional type.
The function follows curried version --> which means we can pass less number of parameters than that of the actual number. If we do that,
the partially applied function is stored in a temporary variable; we can then use the temporary variable to pass the remaining parameters.
If supplied with all parameters, "/:" can be called as: x./:(0)(_+_) where x is val/var of List type. OR "/:" can be called in two steps which are given as:
step:1 val temp = x./:(0)(_) where we pass only the first parameter. This results in a partially applied function which is stored in the temp variable.
step:2 temp(_+_) here using the partially applied function temp is passed with the second (final) parameter.
If we decide to follow the first style ( x./:(0)(_+_) ), calling the first parameter can be written in operator notion which is: x /: 0
Since the method name ends with a colon, the object will be pulled from right side. So x /: 0 is invalid and it has to be written as 0 /: x which is correct.
This one is equivalent to the temp variable. On following 0 /: x, second parameter also needs to be passed. So the whole construct becomes: (0/:x)(_+_)
This is how the definition of the function sum in the question, is interpreted.
We have to note that when we use curried version of the function in operator notion, we have to supply all the parameters in a single go.
That is: (0 /: x) (_) OR (0 /: x) _ seems throwing syntax errors.

Related

What does an underscore after a scala method call mean?

The scala documentation has a code example that includes the following line:
val numberFunc = numbers.foldLeft(List[Int]())_
What does the underscore after the method call mean?
It's a partially applied function. You only provide the first parameter to foldLeft (the initial value), but you don't provide the second one; you postpone it for later. In the docs you linked they do it in the next line, where they define squares:
val numberFunc = numbers.foldLeft(List[Int]())_
val squares = numberFunc((xs, x) => xs:+ x*x)
See that (xs, x) => xs:+ x*x, that's the missing second parameter which you omitted while defining numberFunc. If you had provided it right away, then numberFunc would not be a function - it would be the computed value.
So basically the whole thing can also be written as a one-liner in the curried form:
val squares = numbers.foldLeft(List[Int]())((xs, x) => xs:+ x*x)
However, if you want to be able to reuse foldLeft over and over again, having the same collection and initial value, but providing a different function every time, then it's very convinient to define a separate numbersFunc (as they did in the docs) and reuse it with different functions, e.g.:
val squares = numberFunc((xs, x) => xs:+ x*x)
val cubes = numberFunc((xs, x) => xs:+ x*x*x)
...
Note that the compiler error message is pretty straightforward in case you forget the underscore:
Error: missing argument list for method foldLeft in trait
LinearSeqOptimized Unapplied methods are only converted to functions
when a function type is expected. You can make this conversion
explicit by writing foldLeft _ or foldLeft(_)(_) instead of
foldLeft. val numberFunc = numbers.foldLeft(ListInt)
EDIT: Haha I just realized that they did the exact same thing with cubes in the documentation.
I don't know if it helps but I prefer this syntax
val numberFunc = numbers.foldLeft(List[Int]())(_)
then numberFunc is basically a delegate corresponding to an instance method (instance being numbers) waiting for a parameter. Which later comes to be a lambda expression in the scala documentation example

Scala currying example in the tutorial is confusing me

I'm reading a bit about Scala currying here and I don't understand this example very much:
def foldLeft[B](z: B)(op: (B, A) => B): B
What is the [B] in square brackets? Why is it in brackets? The B after the colon is the return type right? What is the type?
It looks like this method has 2 parameter lists: one with a parameter named z and one with a parameter named op which is a function.
op looks like it takes a function (B, A) => B). What does the right side mean? It returns B?
And this is apparently how it is used:
val numbers = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
val res = numbers.foldLeft(0)((m, n) => m + n)
print(res) // 55
What is going on? Why wasn't the [B] needed when called?
In Scala documentation that A type (sometimes A1) is often the placeholder for a collection's element type. So if you have...
List('c','q','y').foldLeft( //....etc.
...then A becomes the reference for Char because the list is a List[Char].
The B is a placeholder for a 2nd type that foldLeft will have to deal with. Specifically it is the type of the 1st parameter as well as the type of the foldLeft result. Most of the time you actually don't need to specify it when foldLeft is invoked because the compiler will infer it. So, yeah, it could have been...
numbers.foldLeft[Int](0)((m, n) => m + n)
...but why bother? The compiler knows it's an Int and so does anyone reading it (anyone who knows Scala).
(B, A) => B is a function that takes 2 parameters, one of type B and one of type A, and produces a result of type B.
What is the [B] in square brackets?
A type parameter (that's also known as "generics" if you've seen something like Java before)
Why is it in brackets?
Because type parameters are written in brackets, that's just Scala syntax. Java, C#, Rust, C++ use angle brackets < > for similar purposes, but since arrays in Scala are accessed as arr(idx), and (unlike Haskell or Python) Scala does not use [ ... ] for list comprehensions, the square brackets could be used for type parameters, and there was no need for angular brackets (those are more difficult to parse anyway, especially in a language which allows almost arbitrary names for infix and postfix operators).
The B after the colon is the return type right?
Right.
What is the type?
Ditto. The return type is B.
It looks like this method has 2 parameter lists: one with a parameter named z and one with a parameter named op which is a function.
This method has a type parameter B and two argument lists for value arguments, correct. This is done to simplify the type inference: the type B can be inferred from the first argument z, so it does not have to be repeated when writing down the lambda-expression for op. This wouldn't work if z and op were in the same argument list.
op looks like it takes a function (B, A) => B.
The type of the argument op is (B, A) => B, that is, Function2[B, A, B], a function that takes a B and an A and returns a B.
What does the right side mean? It returns B?
Yes.
What is going on?
m acts as accumulator, n is the current element of the list. The fold starts with integer value 0, and then accumulates from left to right, adding up all numbers. Instead of (m, n) => m + n, you could have written _ + _.
Why wasn't the [B] needed when called?
It was inferred from the type of z. There are many other cases where the generic type cannot be inferred automatically, then you would have to specify the return type explicitly by passing it as an argument in the square brackets.
This is what is called polymorphism. The function can work on multiple types and you sometimes want to give a parameter of what type will be worked with. Basically the B is a type parameter and can either be given explicitly as a type, which would be Int and then it should be given in square brackets or implicitly in parentheses like you did with the 0. Read about polymorphism here Scala polymorphism

Apache-Spark : What is map(_._2) shorthand for?

I read a project's source code, found:
val sampleMBR = inputMBR.map(_._2).sample
inputMBR is a tuple.
the function map's definition is :
map[U classTag](f:T=>U):RDD[U]
it seems that map(_._2) is the shorthand for map(x => (x._2)).
Anyone can tell me rules of those shorthand ?
The _ syntax can be a bit confusing. When _ is used on its own it represents an argument in the anonymous function. So if we working on pairs:
map(_._2 + _._2) would be shorthand for map(x, y => x._2 + y._2). When _ is used as part of a function name (or value name) it has no special meaning. In this case x._2 returns the second element of a tuple (assuming x is a tuple).
collection.map(_._2) emits a second component of the tuple. Example from pure Scala (Spark RDDs work the same way):
scala> val zipped = (1 to 10).zip('a' to 'j')
zipped: scala.collection.immutable.IndexedSeq[(Int, Char)] = Vector((1,a), (2,b), (3,c), (4,d), (5,e), (6,f), (7,g), (8,h), (9,i), (10,j))
scala> val justLetters = zipped.map(_._2)
justLetters: scala.collection.immutable.IndexedSeq[Char] = Vector(a, b, c, d, e, f, g, h, i, j)
Two underscores in '_._2' are different.
First '_' is for placeholder of anonymous function; Second '_2' is member of case class Tuple.
Something like:
case class Tuple3 (_1: T1, _2: T2, _3: T3)
{...}
I have found the solutions.
First the underscore here is as placeholder.
To make a function literal even more concise, you can use underscores
as placeholders for one or more parameters, so long as each parameter
appears only one time within the function literal.
See more about underscore in Scala at What are all the uses of an underscore in Scala?.
The first '_' is referring what is mapped to and since what is mapped to is a tuple you might call any function within the tuple and one of the method is '_2' so what below tells us transform input into it's second attribute.

difference between foldLeft and reduceLeft in Scala

I have learned the basic difference between foldLeft and reduceLeft
foldLeft:
initial value has to be passed
reduceLeft:
takes first element of the collection as initial value
throws exception if collection is empty
Is there any other difference ?
Any specific reason to have two methods with similar functionality?
Few things to mention here, before giving the actual answer:
Your question doesn't have anything to do with left, it's rather about the difference between reducing and folding
The difference is not the implementation at all, just look at the signatures.
The question doesn't have anything to do with Scala in particular, it's rather about the two concepts of functional programming.
Back to your question:
Here is the signature of foldLeft (could also have been foldRight for the point I'm going to make):
def foldLeft [B] (z: B)(f: (B, A) => B): B
And here is the signature of reduceLeft (again the direction doesn't matter here)
def reduceLeft [B >: A] (f: (B, A) => B): B
These two look very similar and thus caused the confusion. reduceLeft is a special case of foldLeft (which by the way means that you sometimes can express the same thing by using either of them).
When you call reduceLeft say on a List[Int] it will literally reduce the whole list of integers into a single value, which is going to be of type Int (or a supertype of Int, hence [B >: A]).
When you call foldLeft say on a List[Int] it will fold the whole list (imagine rolling a piece of paper) into a single value, but this value doesn't have to be even related to Int (hence [B]).
Here is an example:
def listWithSum(numbers: List[Int]) = numbers.foldLeft((List.empty[Int], 0)) {
(resultingTuple, currentInteger) =>
(currentInteger :: resultingTuple._1, currentInteger + resultingTuple._2)
}
This method takes a List[Int] and returns a Tuple2[List[Int], Int] or (List[Int], Int). It calculates the sum and returns a tuple with a list of integers and it's sum. By the way the list is returned backwards, because we used foldLeft instead of foldRight.
Watch One Fold to rule them all for a more in depth explanation.
reduceLeft is just a convenience method. It is equivalent to
list.tail.foldLeft(list.head)(_)
foldLeft is more generic, you can use it to produce something completely different than what you originally put in. Whereas reduceLeft can only produce an end result of the same type or super type of the collection type. For example:
List(1,3,5).foldLeft(0) { _ + _ }
List(1,3,5).foldLeft(List[String]()) { (a, b) => b.toString :: a }
The foldLeft will apply the closure with the last folded result (first time using initial value) and the next value.
reduceLeft on the other hand will first combine two values from the list and apply those to the closure. Next it will combine the rest of the values with the cumulative result. See:
List(1,3,5).reduceLeft { (a, b) => println("a " + a + ", b " + b); a + b }
If the list is empty foldLeft can present the initial value as a legal result. reduceLeft on the other hand does not have a legal value if it can't find at least one value in the list.
For reference, reduceLeft will error if applied to an empty container with the following error.
java.lang.UnsupportedOperationException: empty.reduceLeft
Reworking the code to use
myList foldLeft(List[String]()) {(a,b) => a+b}
is one potential option. Another is to use the reduceLeftOption variant which returns an Option wrapped result.
myList reduceLeftOption {(a,b) => a+b} match {
case None => // handle no result as necessary
case Some(v) => println(v)
}
The basic reason they are both in Scala standard library is probably because they are both in Haskell standard library (called foldl and foldl1). If reduceLeft wasn't, it would quite often be defined as a convenience method in different projects.
From Functional Programming Principles in Scala (Martin Odersky):
The function reduceLeft is defined in terms of a more general function, foldLeft.
foldLeft is like reduceLeft but takes an accumulator z, as an additional parameter, which is returned when foldLeft is called on an empty list:
(List (x1, ..., xn) foldLeft z)(op) = (...(z op x1) op ...) op x
[as opposed to reduceLeft, which throws an exception when called on an empty list.]
The course (see lecture 5.5) provides abstract definitions of these functions, which illustrates their differences, although they are very similar in their use of pattern matching and recursion.
abstract class List[T] { ...
def reduceLeft(op: (T,T)=>T) : T = this match{
case Nil => throw new Error("Nil.reduceLeft")
case x :: xs => (xs foldLeft x)(op)
}
def foldLeft[U](z: U)(op: (U,T)=>U): U = this match{
case Nil => z
case x :: xs => (xs foldLeft op(z, x))(op)
}
}
Note that foldLeft returns a value of type U, which is not necessarily the same type as List[T], but reduceLeft returns a value of the same type as the list).
To really understand what are you doing with fold/reduce,
check this: http://wiki.tcl.tk/17983
very good explanation. once you get the concept of fold,
reduce will come together with the answer above:
list.tail.foldLeft(list.head)(_)
Scala 2.13.3, Demo:
val names = List("Foo", "Bar")
println("ReduceLeft: "+ names.reduceLeft(_+_))
println("ReduceRight: "+ names.reduceRight(_+_))
println("Fold: "+ names.fold("Other")(_+_))
println("FoldLeft: "+ names.foldLeft("Other")(_+_))
println("FoldRight: "+ names.foldRight("Other")(_+_))
outputs:
ReduceLeft: FooBar
ReduceRight: FooBar
Fold: OtherFooBar
FoldLeft: OtherFooBar
FoldRight: FooBarOther

Scala underscore minimal function

Let's create a value for the sake of this question:
val a = 1 :: Nil
now, I can demonstrate that the anonymous functions can be written in shorthand form like this:
a.map(_*2)
is it possible to write a shorthand of this function?:
a.map((x) => x)
my solution doesn't work:
a.map(_)
For the record, a.map(_) does not work because it stands for x => a.map(x), and not a.map(x => x). This happens because a single _ in place of a parameter stands for a partially applied function. In the case of 2*_, that stands for an anonymous function. These two uses are so close that is very common to get confused by them.
Your first shorthand form can also be written point-free
a map (2*)
Thanks to multiplication being commutative.
As for (x) => x, you want the identity function. This is defined in Predef and is generic, so you can be sure that it's type-safe.
You should use identity function for this use case.
a.map(identity)
identity is defined in scala.Predef as:
implicit def identity[A](x: A): A = x