Scala why flatMap treats (x => x) different than (identity) - scala

First, map treats x => x identical than identity
List(1,2,3).map(x => x) //res0: List[Int] = List(1, 2, 3)
List(1,2,3).map(identity) //res1: List[Int] = List(1, 2, 3)
Now let's transform a List[Option[Int]] into List[Int] discarding all None. We can do that by .flatten. But the point of this question is to understand how flatMap treats identity
val maybeNums = List(Some(1), None, Some(-2), None, None, Some(33))
// Works OK, result = List[Int] = List(1, -2, 33)
maybeNums.flatMap(x => x)
maybeNums.flatMap(x => x.map(identity))
// Not working:
maybeNums.flatMap(identity)
Error:(5, 20) type mismatch;
found : Option[Int] => Option[Int]
required: Option[Int] => scala.collection.GenTraversableOnce[?]
Question: why does maybeNums.flatMap(identity) give a compilation error while maybeNums.flatMap(x => x) works OK?

Funny thing i had a similar problem to yours. This behavior is caused by the fact that Option is not GenTraversableOnce but there exists an implicit conversion to one. Compiler informs you what is wrong, but unfortunately as it is the case frequently in Scala it does not point the real reason of a mistake. If your collection was containing elements of GenTraversableOnce type, flatMap method would work just fine.
At first i thought that implicit conversion would solve this problem, but it turns out that eta expansion needs types to match explicitly. What's more interesting is that the following code compiles:
ys.flatMap(identity(_))
I assume that in this case implicit conversion from Option[Int] => Option[Int] to Option[Int] => GenTraversableOnce[Int] happens.
In the case of x => x right x is converted with the previously mentioned implicit conversion so the code compiles

A bit more detailed explanation of Lampart's answer.
It comes down to how type inference works in Scala and use of expected types.
For maybeNums.flatMap(???) to work, ??? must have type Option[Int] => GenTraversableOnce[?A] (where ?A stands for some unknown type). That's the expected type.
When ??? is a lambda expression like x => x, the argument is typed as Option[Int] and the body is typed with the expected type GenTraversableOnce[?A]. The type of the body without expected type is Option[Int], so the implicit conversion from Option[Int] to GenTraversableOnce[Int] is found and inserted.
When ??? is identity, it's short for identity[?B] for some yet unknown type ?B (in the same sense as ?A above, but they don't have to be the same, of course), and so has type ?B => ?B. So the compiler needs to solve an equation with two unknown types: ?B => ?B == Option[Int] => GenTraversableOnce[?A]. It matches argument types to pick ?B = Option[Int] for ?B, but can't find a suitable ?A. When the error is printed, ?B is substituted and the type which I wrote as ?A above is printed as ? (because it's the only remaining unknown).
In case identity(_), it's expanded to x => identity(x). Again, the type of the argument is inferred to be Option[Int] and the body has calculated type Option[Int] and expected type GenTraversableOnce[?A].

Related

Scala Map's get vs apply operation: "type mismatch"

I am learning Scala and found the following:
List(('a', 1)).toMap get 'a' // Option[Int] = Some(1)
(List(('a', 1)).toMap) apply 'a' // Int = 1
(List(('a', 1)).toMap)('a') // Error: type mismatch;
found : Char('a')
required: <:<[(Char, Int),(?, ?)
(List(('a', 1)).toMap)('a')
But then assigning it to a variable works again.
val b = (List(('a', 1)).toMap)
b('a') // Int = 1
Why is this so?
The standard docs gives:
ms get k
The value associated with key k in map ms as an option, None if not found.
ms(k) (or, written out, ms apply k)
The value associated with key k in map ms, or exception if not found.
Why doesn't the third line work?
It's essentially just an idiosyncratic collision of implicit arguments with apply-syntactic sugar and strange parentheses-elimination behavior.
As explained here, the parentheses in
(List(('a', 1)).toMap)('a')
are discarded a bit too early, so that you end up with
List(('a', 1)).toMap('a')
so that the compiler attempts to interpret 'a' as an implicit evidence of (Char, Int) <:< (?, ?) for some unknown types ?, ?.
This here works (it's not useful, it's just to demonstrate what the compiler would usually expect at this position):
(List(('a', 1)).toMap(implicitly[(Char, Int) <:< (Char, Int)]))('a')
Assigning List(...).toMap to a variable also works:
({val l = List((1, 2)).toMap; l})(1)
Alternatively, you could force toMap to stop accepting arguments by feeding it to identity function that does nothing:
identity(List((1, 2)).toMap)(1)
But the easiest and clearest way to disambiguate implicit arguments and apply-syntactic sugar is to just write out .apply explicitly:
List((1, 2)).toMap.apply(1)
I think at this point it should be obvious why .get behaves differently, so I won't elaborate on that.
The signature is slightly different:
abstract def get(key: K): Option[V]
def apply(key: K): V
The issue is error handling: get will return None when an element is not found and apply will throw an exception:
scala> Map(1 -> 2).get(3)
res0: Option[Int] = None
scala> Map(1 -> 2).apply(3)
java.util.NoSuchElementException: key not found: 3
at scala.collection.immutable.Map$Map1.apply(Map.scala:111)
... 36 elided
Regarding the failing line: toMap has an implicit argument ev: A <:< (K,V) expressing a type constraint. When you call r.toMap('a') you are passing an explicit value for the implicit but it has the wrong type. Scala 2.13.0 has a companion object <:< that provides a reflexivity method (using the given type itself instead of a proper sub-type). Now the following works:
scala> List(('a', 1)).toMap(<:<.refl)('a')
res3: Int = 1
Remark: i could not invoke <:<.refl in Scala 2.12.7, the addition seems to be quite recent.

Difference between these two function formats

I am working on spark and not an expert in scala. I have got the two variants of map function. Could you please explain the difference between them.?
first variant and known format.
first variant
val.map( (x,y) => x.size())
Second variant -> This has been applied on tuple
val.map({case (x, y) => y.toString()});
The type of val is RDD[(IntWritable, Text)]. When i tried with first function, it gave error as below.
type mismatch;
found : (org.apache.hadoop.io.IntWritable, org.apache.hadoop.io.Text) ⇒ Unit
required: ((org.apache.hadoop.io.IntWritable, org.apache.hadoop.io.Text)) ⇒ Unit
When I added extra parenthesis it said,
Tuples cannot be directly destructured in method or function parameters.
Well you say:
The type of val is RDD[(IntWritable, Text)]
so it is a tuple of arity 2 with IntWritable and Text as components.
If you say
val.map( (x,y) => x.size())
what you're doing is you are essentially passing in a Function2, a function with two arguments to the map function. This will never compile because map wants a function with one argument. What you can do is the following:
val.map((xy: (IntWritable, Text)) => xy._2.toString)
using ._2 to get the second part of the tuple which is passed in as xy (the type annotation is not required but makes it more clear).
Now the second variant (you can leave out the outer parens):
val.map { case (x, y) => y.toString() }
this is special scala syntax for creating a PartialFunction that immediately matches on the tuple that is passed in to access the x and y parts. This is possible because PartialFunction extends from the regular Function1 class (Function1[A,B] can be written as A => B) with one argument.
Hope that makes it more clear :)
I try this in repl:
scala> val l = List(("firstname", "tom"), ("secondname", "kate"))
l: List[(String, String)] = List((firstname,tom), (secondname,kate))
scala> l.map((x, y) => x.size)
<console>:9: error: missing parameter type
Note: The expected type requires a one-argument function accepting a 2-Tuple.
Consider a pattern matching anonymous function, `{ case (x, y) => ... }`
l.map((x, y) => x.size)
maybe can give you some inspire.
Your first example is a function that takes two arguments and returns a String. This is similar to this example:
scala> val f = (x:Int,y:Int) => x + y
f: (Int, Int) => Int = <function2>
You can see that the type of f is (Int,Int) => Int (just slightly changed this to be returning an int instead of a string). Meaning that this is a function that takes two Int as arguments and returns an Int as a result.
Now the second example you have is a syntactic sugar (a shortcut) for writing something like this:
scala> val g = (k: (Int, Int)) => k match { case (x: Int, y: Int) => x + y }
g: ((Int, Int)) => Int = <function1>
You see that the return type of function g is now ((Int, Int)) => Int. Can you spot the difference? The input type of g has two parentheses. This shows that g takes one argument and that argument must be a Tuple[Int,Int] (or (Int,Int) for short).
Going back to your RDD, what you have is an Collection of Tuple[IntWritable, Text] so the second function will work, whereas the first one will not work.

Type mismatch with map and flatMap

While trying to play with Options in scala I have come across this peculiar problem.
I started off with creating a List[Option[Int]] as follows:
scala> List(Some(1),Some(2),None,Some(3))
res0: List[Option[Int]] = List(Some(1), Some(2), None, Some(3))
Then I tried to map an addition to 1 over the entries of the list in res0 as follows:
scala> res0 map (_ + 1)
This gave me the error:
<console>:9: error: type mismatch;
found : Int(1)
required: String
res0 map (_ + 1)
^
Then I tried flatMapping an addition over the entries as follows:
scala> res0 flatMap (_ + 1)
This gave me the same error:
<console>:9: error: type mismatch;
found : Int(1)
required: String
res0 flatMap (_ + 1)
^
But something like res0.flatMap(r => r) works just fine with a result of:
res9: List[Int] = List(1, 2, 3)
Can anybody tell me why adding the entry to 1 would fail for both map and flatMap?
The first two things you tried failed because you are trying to add an Option to an Int, and that's not possible.
The weird error message happens because Scala assumes, since Option doesn't have a + method, that you are trying String concatenation, but you'd have to either add an Option to a String, or a String to an Option, and you are doing neither, hence the error message.
In the last case, you are not trying to add anything, you are simply returning Option as is, hence no error message.
To increment all values that are not None, you need to map also each Option element of the list, like so:
scala> res0.map(_.map(_ + 1))
res1: List[Option[Int]] = List(Some(2), Some(3), None, Some(4))
If you want to filter out the Nones, you would indeed use a flatMap:
scala> res0.flatMap(_.map(_ + 1))
res2: List[Int] = List(2, 3, 4)
Both the function given to flatMap as well as the function given to map take a value of the list's element type - in this case Option[Int]. But your function _ + 1 expects an Int, not an Option[Int], so you can't use it as the argument to either map or flatMap in this case. Further the function given to flatMap should return an iterable¹, but your function would return a number.
This will do what you want: res0 flatMap (_ map (_ + 1)). Here the function given to flatMap takes an Option[Int] and returns an Option[Int] by calling map on the option. flatMap then takes the options returned by the function and concatenates them.
¹ Technically a GenTraversableOnce.
You try to invokee.+(1) on every element e in a List[Option[Int]], but + is not a function declared by Option[_]. String concatenation, however, would be possible (I assume there is an implicit from Any to String), but only if the second argument were a string as well (not sure why the implicit whose existence I assumed isn't considered here).
You can overcome this problem by doing working with a default value as suggested by #korefn, or "hide" the differentiation between Some(x) and None in another invocation of map, namely by
map(_.map(_ + 1))
They are failing because the types are wrong and the compiler correctly states it as such.
The map case fails because map expects a function A => B. In your code, A => B is really Int => Int which will not work because calling map on your list means that A is actually Option[Int].
Furthermore, flatMap expects a function of the form A => F[B]. So you will get your answer if you did res0 flatMap { o => o map { a => a + 1 } }. This basically is the expansion of:
for {
element <- res0 // o above
value <- element // a above
} yield value + 1
Try to use get to extract the value from Some[Int] to Int allowing for calculation value + 1 ie:
res0 map{_.getOrElse(0) + 1}
As pointed out by #Sepp2k you could alternatively use a collect to avoid having a default for None
res0 collect {case Some(x) => x + 1 }

type inference in fold left one-liner?

I was trying to reverse a List of Integers as follows:
List(1,2,3,4).foldLeft(List[Int]()){(a,b) => b::a}
My question is that is there a way to specify the seed to be some List[_] where the _ is the type automatically filled in by scala's type-inference mechanism, instead of having to specify the type as List[Int]?
Thanks
Update: After reading a bit more on Scala's type inference, I found a better answer to your question. This article which is about the limitations of the Scala type inference says:
Type information in Scala flows from function arguments to their results [...], from left to right across argument lists, and from first to last across statements. This is in contrast to a language with full type inference, where (roughly speaking) type information flows unrestricted in all directions.
So the problem is that Scala's type inference is rather limited. It first looks at the first argument list (the list in your case) and then at the second argument list (the function). But it does not go back.
This is why neither this
List(1,2,3,4).foldLeft(Nil){(a,b) => b::a}
nor this
List(1,2,3,4).foldLeft(List()){(a,b) => b::a}
will work. Why? First, the signature of foldLeft is defined as:
foldLeft[B](z: B)(f: (B, A) => B): B
So if you use Nil as the first argument z, the compiler will assign Nil.type to the type parameter B. And if you use List(), the compiler will use List[Nothing] for B.
Now, the type of the second argument f is fully defined. In your case, it's either
(Nil.type, Int) => Nil.type
or
(List[Nothing], Int) => List[Nothing]
And in both cases the lambda expression (a, b) => b :: a is not valid, since its return type is inferred to be List[Int].
Note that the bold part above says "argument lists" and not "arguments". The article later explains:
Type information does not flow from left to right within an argument list, only from left to right across argument lists.
So the situation is even worse if you have a method with a single argument list.
The only way I know how is
scala> def foldList[T](l: List[T]) = l.foldLeft(List[T]()){(a,b) => b::a}
foldList: [T](l: List[T])List[T]
scala> foldList(List(1,2,3,4))
res19: List[Int] = List(4, 3, 2, 1)
scala> foldList(List("a","b","c"))
res20: List[java.lang.String] = List(c, b, a)

Unexpected Scala pattern matching syntax

I had a List of Scala tuples like the following:
val l = List((1,2),(2,3),(3,4))
and I wanted to map it in a list of Int where each item is the sum of the Ints in a the corresponding tuple. I also didn't want to use to use the x._1 notation so I solved the problem with a pattern matching like this
def addTuple(t: (Int, Int)) : Int = t match {
case (first, second) => first + second
}
var r = l map addTuple
Doing that I obtained the list r: List[Int] = List(3, 5, 7) as expected. At this point, almost by accident, I discovered that I can achieve the same result with an abbreviated form like the following:
val r = l map {case(first, second) => first + second}
I cannot find any reference to this syntax in the documentation I have. Is that normal? Am I missing something trivial?
See Section 8.5 of the language reference, "Pattern Matching Anonymous Functions".
An anonymous function can be defined by a sequence of cases
{case p1 =>b1 ... case pn => bn }
which appear as an expression without a prior match. The expected type of such an expression must in part be defined. It must be either scala.Functionk[S1, ..., Sk, R] for some k > 0, or scala.PartialFunction[S1, R], where the argument type(s) S1, ..., Sk must be fully determined, but the result type R may be undetermined.
The expected type deternines whether this is translated to a FunctionN or PartialFunction.
scala> {case x => x}
<console>:6: error: missing parameter type for expanded function ((x0$1) => x0$1 match {
case (x # _) => x
})
{case x => x}
^
scala> {case x => x}: (Int => Int)
res1: (Int) => Int = <function1>
scala> {case x => x}: PartialFunction[Int, Int]
res2: PartialFunction[Int,Int] = <function1>
{case(first, second) => first + second} is treated as a PartialFunction literal. See examples in "Partial Functions" section here: http://programming-scala.labs.oreilly.com/ch08.html or section 15.7 of Programming in Scala.
Method map accepts a function. In your first example you create a function, assign it to a variable, and pass it to the map method. In the second example you pass your created function directly, omitting assigning it to a variable. You are doing just the same thing.