Unexpected Scala pattern matching syntax - scala

I had a List of Scala tuples like the following:
val l = List((1,2),(2,3),(3,4))
and I wanted to map it in a list of Int where each item is the sum of the Ints in a the corresponding tuple. I also didn't want to use to use the x._1 notation so I solved the problem with a pattern matching like this
def addTuple(t: (Int, Int)) : Int = t match {
case (first, second) => first + second
}
var r = l map addTuple
Doing that I obtained the list r: List[Int] = List(3, 5, 7) as expected. At this point, almost by accident, I discovered that I can achieve the same result with an abbreviated form like the following:
val r = l map {case(first, second) => first + second}
I cannot find any reference to this syntax in the documentation I have. Is that normal? Am I missing something trivial?

See Section 8.5 of the language reference, "Pattern Matching Anonymous Functions".
An anonymous function can be defined by a sequence of cases
{case p1 =>b1 ... case pn => bn }
which appear as an expression without a prior match. The expected type of such an expression must in part be defined. It must be either scala.Functionk[S1, ..., Sk, R] for some k > 0, or scala.PartialFunction[S1, R], where the argument type(s) S1, ..., Sk must be fully determined, but the result type R may be undetermined.
The expected type deternines whether this is translated to a FunctionN or PartialFunction.
scala> {case x => x}
<console>:6: error: missing parameter type for expanded function ((x0$1) => x0$1 match {
case (x # _) => x
})
{case x => x}
^
scala> {case x => x}: (Int => Int)
res1: (Int) => Int = <function1>
scala> {case x => x}: PartialFunction[Int, Int]
res2: PartialFunction[Int,Int] = <function1>

{case(first, second) => first + second} is treated as a PartialFunction literal. See examples in "Partial Functions" section here: http://programming-scala.labs.oreilly.com/ch08.html or section 15.7 of Programming in Scala.

Method map accepts a function. In your first example you create a function, assign it to a variable, and pass it to the map method. In the second example you pass your created function directly, omitting assigning it to a variable. You are doing just the same thing.

Related

Scala : placeholder inside tuple

I've played a bit with placeholder and found a strange case :
val integers = Seq(1, 2)
val f = (x:Int) => x + 1
integers.map((_, f(_)))
which returns
Seq[(Int, Int => Int)] = List((1,<function1>), (2,<function1>))
I was expecting
Seq[(Int, Int)] = List((1, 2), (2, 3))
If I make the following changes, everything works as expected :
integers.map(i => (i, f(i)))
Any idea why the function f is not applied during the mapping ?
The underscore stands in for the passed argument only once. So in integers.map((_, f(_))) the 1st _ is a value from integers but the 2nd _ has the stand-alone meaning of "partially applied function".
If your anonymous function takes 2 (or more) arguments then you can use 2 (or more) underscores, but each stands in for its passed argument only once.
The Scala compiler can't read your mind, so the _ placeholder syntax is only useful in very simple expressions.
In your example:
integers.map((_, f(_)))
it evaluates the f(_) as a standalone sub-expression, so you end up with something equivalent to this:
x => (x, y => f(y))
Even if the compiler didn't treat f(_) as its own sub-expression, the result would not be the same as what you say want:
integers.map(i => (i, f(i)))
You want both instances of _ to be treated as the same argument, which is not how _ works. Each occurrence of _ in an expression is always treated as a unique argument.

Difference between these two function formats

I am working on spark and not an expert in scala. I have got the two variants of map function. Could you please explain the difference between them.?
first variant and known format.
first variant
val.map( (x,y) => x.size())
Second variant -> This has been applied on tuple
val.map({case (x, y) => y.toString()});
The type of val is RDD[(IntWritable, Text)]. When i tried with first function, it gave error as below.
type mismatch;
found : (org.apache.hadoop.io.IntWritable, org.apache.hadoop.io.Text) ⇒ Unit
required: ((org.apache.hadoop.io.IntWritable, org.apache.hadoop.io.Text)) ⇒ Unit
When I added extra parenthesis it said,
Tuples cannot be directly destructured in method or function parameters.
Well you say:
The type of val is RDD[(IntWritable, Text)]
so it is a tuple of arity 2 with IntWritable and Text as components.
If you say
val.map( (x,y) => x.size())
what you're doing is you are essentially passing in a Function2, a function with two arguments to the map function. This will never compile because map wants a function with one argument. What you can do is the following:
val.map((xy: (IntWritable, Text)) => xy._2.toString)
using ._2 to get the second part of the tuple which is passed in as xy (the type annotation is not required but makes it more clear).
Now the second variant (you can leave out the outer parens):
val.map { case (x, y) => y.toString() }
this is special scala syntax for creating a PartialFunction that immediately matches on the tuple that is passed in to access the x and y parts. This is possible because PartialFunction extends from the regular Function1 class (Function1[A,B] can be written as A => B) with one argument.
Hope that makes it more clear :)
I try this in repl:
scala> val l = List(("firstname", "tom"), ("secondname", "kate"))
l: List[(String, String)] = List((firstname,tom), (secondname,kate))
scala> l.map((x, y) => x.size)
<console>:9: error: missing parameter type
Note: The expected type requires a one-argument function accepting a 2-Tuple.
Consider a pattern matching anonymous function, `{ case (x, y) => ... }`
l.map((x, y) => x.size)
maybe can give you some inspire.
Your first example is a function that takes two arguments and returns a String. This is similar to this example:
scala> val f = (x:Int,y:Int) => x + y
f: (Int, Int) => Int = <function2>
You can see that the type of f is (Int,Int) => Int (just slightly changed this to be returning an int instead of a string). Meaning that this is a function that takes two Int as arguments and returns an Int as a result.
Now the second example you have is a syntactic sugar (a shortcut) for writing something like this:
scala> val g = (k: (Int, Int)) => k match { case (x: Int, y: Int) => x + y }
g: ((Int, Int)) => Int = <function1>
You see that the return type of function g is now ((Int, Int)) => Int. Can you spot the difference? The input type of g has two parentheses. This shows that g takes one argument and that argument must be a Tuple[Int,Int] (or (Int,Int) for short).
Going back to your RDD, what you have is an Collection of Tuple[IntWritable, Text] so the second function will work, whereas the first one will not work.

How is the placeholder different to an explicit parameter in lambda functions? [duplicate]

I want to iterate over a list of values using a beautiful one-liner in Scala.
For example, this one works well:
scala> val x = List(1,2,3,4)
x: List[Int] = List(1, 2, 3, 4)
scala> x foreach println
1
2
3
4
But if I use the placeholder _, it gives me an error:
scala> x foreach println(_ + 1)
<console>:6: error: missing parameter type for expanded function ((x$1) =>x$1.$plus(1))
x foreach println(_ + 1)
^
Why is that? Can't compiler infer type here?
This:
x foreach println(_ + 1)
is equivalent to this:
x.foreach(println(x$1 => x$1 + 1))
There's no indication as to what might be the type of x$1, and, to be honest, it doesn't make any sense to print a function.
You obviously (to me) meant to print x$0 + 1, where x$0 would the the parameter passed by foreach, instead. But, let's consider this... foreach takes, as a parameter, a Function1[T, Unit], where T is the type parameter of the list. What you are passing to foreach instead is println(_ + 1), which is an expression that returns Unit.
If you wrote, instead x foreach println, you'd be passing a completely different thing. You'd be passing the function(*) println, which takes Any and returns Unit, fitting, therefore, the requirements of foreach.
This gets slightly confused because of the rules of expansion of _. It expands to the innermost expression delimiter (parenthesis or curly braces), except if they are in place of a parameter, in which case it means a different thing: partial function application.
To explain this better, look at these examples:
def f(a: Int, b: Int, c: Int) = a + b + c
val g: Int => Int = f(_, 2, 3) // Partial function application
g(1)
Here, we applies the second and third arguments to f, and returned a function requiring just the remaining argument. Note that it only worked as is because I indicated the type of g, otherwise I'd have to indicate the type of the argument I was not applying. Let's continue:
val h: Int => Int = _ + 1 // Anonymous function, expands to (x$1: Int => x$1 + 1)
val i: Int => Int = (_ + 1) // Same thing, because the parenthesis are dropped here
val j: Int => Int = 1 + (_ + 1) // doesn't work, because it expands to 1 + (x$1 => x$1 + 1), so it misses the type of `x$1`
val k: Int => Int = 1 + ((_: Int) + 1) // doesn't work, because it expands to 1 + (x$1: Int => x$1 + 1), so you are adding a function to an `Int`, but this operation doesn't exist
Let discuss k in more detail, because this is a very important point. Recall that g is a function Int => Int, right? So, if I were to type 1 + g, would that make any sense? That's what was done in k.
What confuses people is that what they really wanted was:
val j: Int => Int = x$1 => 1 + (x$1 + 1)
In other words, they want the x$1 replacing _ to jump to outside the parenthesis, and to the proper place. The problem here is that, while it may seem obvious to them what the proper place is, it is not obvious to the compiler. Consider this example, for instance:
def findKeywords(keywords: List[String], sentence: List[String]) = sentence.filter(keywords contains _.map(_.toLowerCase))
Now, if we were to expand this to outside the parenthesis, we would get this:
def findKeywords(keywords: List[String], sentence: List[String]) = (x$1, x$2) => sentence.filter(keywords contains x$1.map(x$2.toLowerCase))
Which is definitely not what we want. In fact, if the _ did not get bounded by the innermost expression delimiter, one could never use _ with nested map, flatMap, filter and foreach.
Now, back to the confusion between anonymous function and partial application, look here:
List(1,2,3,4) foreach println(_) // doesn't work
List(1,2,3,4) foreach (println(_)) // works
List(1,2,3,4) foreach (println(_ + 1)) // doesn't work
The first line doesn't work because of how operation notation works. Scala just sees that println returns Unit, which is not what foreachexpects.
The second line works because the parenthesis let Scala evaluate println(_) as a whole. It is a partial function application, so it returns Any => Unit, which is acceptable.
The third line doesn't work because _ + 1 is anonymous function, which you are passing as a parameter to println. You are not making println part of an anonymous function, which is what you wanted.
Finally, what few people expect:
List(1,2,3,4) foreach (Console println _ + 1)
This works. Why it does is left as an exercise to the reader. :-)
(*) Actually, println is a method. When you write x foreach println, you are not passing a method, because methods can't be passed. Instead, Scala creates a closure and passes it. It expands like this:
x.foreach(new Function1[Any,Unit] { def apply(x$1: Any): Unit = Console.println(x$1) })
The underscore is a bit tricky. According to the spec, the phrase:
_ + 1
is equivalent to
x => x + 1
Trying
x foreach println (y => y + 1)
yields:
<console>:6: error: missing parameter type
x foreach println (y => y + 1)
If you add some types in:
x foreach( println((y:Int) => y + 1))
<console>:6: error: type mismatch;
found : Unit
required: (Int) => Unit
x foreach( println((y:Int) => y + 1))
The problem is that you are passing an anonymous function to println and it's not able to deal with it. What you really want to do (if you are trying to print the successor to each item in the list) is:
x map (_+1) foreach println
scala> for(x <- List(1,2,3,4)) println(x + 1)
2
3
4
5
There is a strange limitation in Scala for the nesting depth of expressions with underscore. It's well seen on the following example:
scala> List(1) map(1+_)
res3: List[Int] = List(2)
scala> Some(1) map (1+(1+_))
<console>:5: error: missing parameter type for expanded function ((x$1) => 1.+(x$1))
Some(1) map (1+(1+_))
^
Looks like a bug for me.
Welcome to Scala version 2.8.0.Beta1-prerelease (Java HotSpot(TM) Client VM, Java 1.6.0_17).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val l1 = List(1, 2, 3)
l1: List[Int] = List(1, 2, 3)
scala>
scala> l1.foreach(println(_))
1
2
3

Why can't a variable be assigned placeholder in function literal?

I'm having trouble understanding underscores in function literals.
val l = List(1,2,3,4,5)
l.filter(_ > 0)
works fine
l.filter({_ > 0})
works fine
l.filter({val x=1; 1+_+3 > 0}) // ie you can have multiple statements in your function literal and use the underscore not just in the first statement.
works fine
And yet:
l.filter({val x=_; x > 0})
e>:1: error: unbound placeholder parameter
l.filter({val x=_; x > 0})
I can't assign the _ to a variable, even though the following is legal function literal:
l.filter(y => {val x=y; x > 0})
works fine.
What gives? Is my 'val x=_' getting interpreted as something else? Thanks!
Actually, you have to back up a step.
You are misunderstanding how the braces work.
scala> val is = (1 to 5).toList
is: List[Int] = List(1, 2, 3, 4, 5)
scala> is map ({ println("hi") ; 2 * _ })
hi
res2: List[Int] = List(2, 4, 6, 8, 10)
If the println were part of the function passed to map, you'd see more greetings.
scala> is map (i => { println("hi") ; 2 * i })
hi
hi
hi
hi
hi
res3: List[Int] = List(2, 4, 6, 8, 10)
Your extra braces are a block, which is some statements followed by a result expression. The result expr is the function.
Once you realize that only the result expr has an expected type that is the function expected by map, you wouldn't think to use underscore in the preceding statements, since a bare underscore needs the expected type to nail down what the underscore means.
That's the type system telling you that your underscore isn't in the right place.
Appendix: in comments you ask:
how can I use the underscore syntax to bind the parameter of a
function literal to a variable
Is this a "dumb" question, pardon the expression?
The underscore is so you don't have to name the parameter, then you say you want to name it.
One use case might be: there are few incoming parameters, but I'm interested in naming only one of them.
scala> (0 /: is)(_ + _)
res10: Int = 15
scala> (0 /: is) { case (acc, i) => acc + 2 * i }
res11: Int = 30
This doesn't work, but one may wonder why. That is, we know what the fold expects, we want to apply something with an arg. Which arg? Whatever is left over after the partially applied partial function.
scala> (0 /: is) (({ case (_, i) => _ + 2 * i })(_))
or
scala> (0 /: is) (({ case (_, i) => val d = 2 * i; _ + 2 * d })(_))
SLS 6.23 "placeholder syntax for anonymous functions" mentions the "expr" boundary for when you must know what the underscore represents -- it's not a scope per se. If you supply type ascriptions for the underscores, it will still complain about the expected type, presumably because type inference goes left to right.
The underscore syntax is mainly user for the following replacement:
coll.filter(x => { x % 2 == 0});
coll.filter(_ % 2 == 0);
This can only replace a single parameter. This is the placeholder syntax.
Simple syntactic sugar for a lambda.
In the breaking case you are attempting null initialization/defaulting.
For primitive types with init conventions:
var x: Int = _; // x will be 0
The general case:
var y: List[String] = _; // y is null
var z: Any = _; // z = null;
To get pedantic, it works because null is a ref to the only instance of scala.Null, a sub-type of any type, which will always satisfy the type bound because of covariance. Look HERE.
A very common usage scenario, in ScalaTest:
class myTest extends FeatureTest with GivenWhenThen with BeforeAndAfter {
var x: OAuthToken = _;
before {
x = someFunctionThatReturnsAToken;
}
}
You can also see why you shouldn't use it with val, since the whole point is to update the value after initialization.
The compiler won't even let you, failing with: error: unbound placeholder parameter.
This is your exact case, the compiler thinks you are defaulting, a behaviour undefined for vals.
Various constraints, such as timing or scope make this useful.
This is different from lazy, where you predefine the expression that will be evaluated when needed.
For more usages of _ in Scala, look HERE.
Because in this two cases underscore (_) means two different things. In case of a function it's a syntactic sugar for lambda function, your l.filter(_ > 0) later desugares into l.filter(x => x > 0). But in case of a var it has another meaning, not a lambda function, but a default value and this behavior is defined only for var's:
class Test {
var num: Int = _
}
Here num gonna be initialized to its default value determined by its type Int. You can't do this with val cause vals are final and if in case of vars you can later assign them some different values, with vals this has no point.
Update
Consider this example:
l filter {
val x = // compute something
val z = _
x == z
}
According to your idea, z should be bound to the first argument, but how scala should understand this, or you you have more code in this computation and then underscore.
Update 2
There is a grate option in scala repl: scala -Xprint:type. If you turn it on and print your code in (l.filter({val x=1; 1+_+3 > 0})), this what you'll see:
private[this] val res1: List[Int] = l.filter({
val x: Int = 1;
((x$1: Int) => 1.+(x$1).+(3).>(0))
});
1+_+3 > 0 desugares into a function: ((x$1: Int) => 1.+(x$1).+(3).>(0)), what filter actually expects from you, a function from Int to Boolean. The following also works:
l.filter({val x=1; val f = 1+(_: Int)+3 > 0; f})
cause f here is a partially applied function from Int to Boolean, but underscore isn't assigned to the first argument, it's desugares to the closes scope:
private[this] val res3: List[Int] = l.filter({
val x: Int = 1;
val f: Int => Boolean = ((x$1: Int) => 1.+((x$1: Int)).+(3).>(0));
f
});

Scala foreach strange behaviour

I want to iterate over a list of values using a beautiful one-liner in Scala.
For example, this one works well:
scala> val x = List(1,2,3,4)
x: List[Int] = List(1, 2, 3, 4)
scala> x foreach println
1
2
3
4
But if I use the placeholder _, it gives me an error:
scala> x foreach println(_ + 1)
<console>:6: error: missing parameter type for expanded function ((x$1) =>x$1.$plus(1))
x foreach println(_ + 1)
^
Why is that? Can't compiler infer type here?
This:
x foreach println(_ + 1)
is equivalent to this:
x.foreach(println(x$1 => x$1 + 1))
There's no indication as to what might be the type of x$1, and, to be honest, it doesn't make any sense to print a function.
You obviously (to me) meant to print x$0 + 1, where x$0 would the the parameter passed by foreach, instead. But, let's consider this... foreach takes, as a parameter, a Function1[T, Unit], where T is the type parameter of the list. What you are passing to foreach instead is println(_ + 1), which is an expression that returns Unit.
If you wrote, instead x foreach println, you'd be passing a completely different thing. You'd be passing the function(*) println, which takes Any and returns Unit, fitting, therefore, the requirements of foreach.
This gets slightly confused because of the rules of expansion of _. It expands to the innermost expression delimiter (parenthesis or curly braces), except if they are in place of a parameter, in which case it means a different thing: partial function application.
To explain this better, look at these examples:
def f(a: Int, b: Int, c: Int) = a + b + c
val g: Int => Int = f(_, 2, 3) // Partial function application
g(1)
Here, we applies the second and third arguments to f, and returned a function requiring just the remaining argument. Note that it only worked as is because I indicated the type of g, otherwise I'd have to indicate the type of the argument I was not applying. Let's continue:
val h: Int => Int = _ + 1 // Anonymous function, expands to (x$1: Int => x$1 + 1)
val i: Int => Int = (_ + 1) // Same thing, because the parenthesis are dropped here
val j: Int => Int = 1 + (_ + 1) // doesn't work, because it expands to 1 + (x$1 => x$1 + 1), so it misses the type of `x$1`
val k: Int => Int = 1 + ((_: Int) + 1) // doesn't work, because it expands to 1 + (x$1: Int => x$1 + 1), so you are adding a function to an `Int`, but this operation doesn't exist
Let discuss k in more detail, because this is a very important point. Recall that g is a function Int => Int, right? So, if I were to type 1 + g, would that make any sense? That's what was done in k.
What confuses people is that what they really wanted was:
val j: Int => Int = x$1 => 1 + (x$1 + 1)
In other words, they want the x$1 replacing _ to jump to outside the parenthesis, and to the proper place. The problem here is that, while it may seem obvious to them what the proper place is, it is not obvious to the compiler. Consider this example, for instance:
def findKeywords(keywords: List[String], sentence: List[String]) = sentence.filter(keywords contains _.map(_.toLowerCase))
Now, if we were to expand this to outside the parenthesis, we would get this:
def findKeywords(keywords: List[String], sentence: List[String]) = (x$1, x$2) => sentence.filter(keywords contains x$1.map(x$2.toLowerCase))
Which is definitely not what we want. In fact, if the _ did not get bounded by the innermost expression delimiter, one could never use _ with nested map, flatMap, filter and foreach.
Now, back to the confusion between anonymous function and partial application, look here:
List(1,2,3,4) foreach println(_) // doesn't work
List(1,2,3,4) foreach (println(_)) // works
List(1,2,3,4) foreach (println(_ + 1)) // doesn't work
The first line doesn't work because of how operation notation works. Scala just sees that println returns Unit, which is not what foreachexpects.
The second line works because the parenthesis let Scala evaluate println(_) as a whole. It is a partial function application, so it returns Any => Unit, which is acceptable.
The third line doesn't work because _ + 1 is anonymous function, which you are passing as a parameter to println. You are not making println part of an anonymous function, which is what you wanted.
Finally, what few people expect:
List(1,2,3,4) foreach (Console println _ + 1)
This works. Why it does is left as an exercise to the reader. :-)
(*) Actually, println is a method. When you write x foreach println, you are not passing a method, because methods can't be passed. Instead, Scala creates a closure and passes it. It expands like this:
x.foreach(new Function1[Any,Unit] { def apply(x$1: Any): Unit = Console.println(x$1) })
The underscore is a bit tricky. According to the spec, the phrase:
_ + 1
is equivalent to
x => x + 1
Trying
x foreach println (y => y + 1)
yields:
<console>:6: error: missing parameter type
x foreach println (y => y + 1)
If you add some types in:
x foreach( println((y:Int) => y + 1))
<console>:6: error: type mismatch;
found : Unit
required: (Int) => Unit
x foreach( println((y:Int) => y + 1))
The problem is that you are passing an anonymous function to println and it's not able to deal with it. What you really want to do (if you are trying to print the successor to each item in the list) is:
x map (_+1) foreach println
scala> for(x <- List(1,2,3,4)) println(x + 1)
2
3
4
5
There is a strange limitation in Scala for the nesting depth of expressions with underscore. It's well seen on the following example:
scala> List(1) map(1+_)
res3: List[Int] = List(2)
scala> Some(1) map (1+(1+_))
<console>:5: error: missing parameter type for expanded function ((x$1) => 1.+(x$1))
Some(1) map (1+(1+_))
^
Looks like a bug for me.
Welcome to Scala version 2.8.0.Beta1-prerelease (Java HotSpot(TM) Client VM, Java 1.6.0_17).
Type in expressions to have them evaluated.
Type :help for more information.
scala> val l1 = List(1, 2, 3)
l1: List[Int] = List(1, 2, 3)
scala>
scala> l1.foreach(println(_))
1
2
3