Transforming Map using map in scala - scala

Given a string I want to create a map that for each character in a string will give the number of times the character occurs in a string. The following function makes a map from character to a list of Strings.
def wordOccurrences(w: String) = {
val lower = w.toLowerCase.toList
lower.groupBy(t => t)
}
Now I wanted to alter the last line to:
lower.groupBy(t => t) map ( (x,y) => x -> y.length)
But it doesn't work, can someone explain why and how to fix it?

For mapping purposes, a Map[K, V] is an Iterable[(K, V)] (notice the extra pair of parentheses, identifying a tuple type), meaning that when you map over it you have pass a function that goes from (K, V) to your target type.
What you are doing, however, is passing a function that takes two independent arguments, rather then a single tuple argument.
The difference can be seen by inspecting the types of these two functions in the Scala shell:
scala> :t (a: Int, b: Int) => a + b
(Int, Int) => Int
scala> :t (p: (Int, Int)) => p._1 + p._2
((Int, Int)) => Int
Notice how the former takes two arguments while the latter takes a single tuple.
What you can do is pass a function which decomposes the tuple so that you can bind the components of the tuple independently:
lower.groupBy(t => t) map { case (x, y) => x -> y.length }
or alternatively pass a function which uses the tuple without deconstructing it
lower.groupBy(t => t) map (p => p._1 -> p._2.length)
Note
Dotty, which is the current project Scala's original author Martin Odersky is working on and that will probably become Scala 3, supports the syntax you are proposing, calling the feature function arity adaptation. This has been discussed, along with other feature, in Odersky's 2016 Keynote at Scala eXchange, "From DOT to Dotty" (here the video taped at 2017 Voxxed Days CERN).

You can use
lower.groupBy(t => t).mapValues(_.length)

Related

How does the underscore placeholder in (println _) represent an entire parameter list of its original function literal?

I tried to make an example to explain that the placeholder _ without parentheses can represent (be expanded to) any number of parameters with any type, not just can represent "only one" parameter with any type. However, the one I made was incorrect since the parameter (function literal) of foreach would still take one parameter only.
// The following code to explain the placeholder rule above is incorrect.
I made a modified example to expound this rule simply.
val list1 = List(1,2,3)
val list2 = List((1,2),(3,4),(5,6))
val list3 = List((1,2,3),(4,5,6),(7,8,9))
scala> list1.foreach(println _) // _ is expanded to 1 parameter in each iteration
1
2
3
scala> list2.foreach(println _) // _ is expanded to 2 parameters in each iteration
(1,2)
(3,4)
(5,6)
scala> list3.foreach(println _) // _ is expanded to 3 parameters in each iteration
(1,2,3)
(4,5,6)
(7,8,9)
This might be able to explain the rule more clearly.
I hope it is correct.
// The original question
In Chapter 8.6 PARTIALLY APPLIED FUNCTIONS of the book Programming in Scala, 3rd Edition, An example shows:
val list = List(1,2,3)
list.foreach(x => println(x))
The context says the function literal
println _
can substitute for
x => println(x)
because the _ can represent an entire parameter list.
I know an underscore leaving a space between itself and the function name (println, in this case) means the underscore represents an entire parameter list.
In this case, however, there is only one parameter (the Int element of each iteration) in the original function literal.
Why does this tutorial say _ represents an entire parameter list?
The function literal
x => println(x) // Only one parameter? Where's the entire parameter list?
in
list.foreach(x => println(x))
obviously has only one parameter, correct?
Why does this tutorial say _ represents an entire parameter list?
Because it's talking about the entire parameter list of println. Which has only one parameter.
Do you mean println _ represents println(element1: Int, element2: Int, ... elementN: Int)
No. To determine the meaning of println _ we look at its signature
def println(x: Any): Unit
"The entire parameter list" is (x: Any), so println _ is the same as (x: Any) => println(x). If you have def foo(x: Int, y: Int) = x + y, then foo _ will be (x: Int, y: Int) => foo(x, y).
Note: There is also the overload with no parameters def println(): Unit, but the compiler determines it makes no sense here because foreach expects a function with a single parameter; but e.g. in
val f: () => Unit = println _
println _ is equivalent to () => println() instead of (x: Any) => println(x).

Why is scala.collection.immutable.List[Object] not GenTraversableOnce[?]

Simple question, and sorry if this is a stupid question as I am just beginning in scala. I am getting a type mismatch error that says:
found : (AnyRef, org.apache.tinkerpop.gremlin.hadoop.structure.io.VertexWritable) => List[Object]
required: ((AnyRef, org.apache.tinkerpop.gremlin.hadoop.structure.io.VertexWritable)) => scala.collection.GenTraversableOnce[?]
But according to this post (I have a Scala List, how can I get a TraversableOnce?), a scala.collection.immutable.List is an Iterable and therefore also a GenTraversableOnce. And yet this error seems to indicate otherwise. And furthermore, when I actually look at the link in the accepted answer of that post, I don't see any reference to the word "traversable".
If the problem has to do with my inner class not being correct, then I have to say this error is extremely uninformative, since requiring that the inner class be of type "?" is obviously a vacuous statement ... Any help in understanding this would be appreciated.
Function2[X, Y, Z] is not the same thing as Function1[(X, Y), Z].
Compare these two definitions:
val f: ((Int, Int)) => Int = xy => xy._1 + xy._2
val f: (Int, Int) => Int = (x, y) => x + y
The first could also be written with a pattern-matching, that first decomposes the tuple:
val f: ((Int, Int)) => Int = { case (x, y) => x + y }
This is exactly what the error message asks you to do: provide an unary function that takes a tuple as argument, not a binary function. Note that there is the tupled-method, that does exactly that.
The return types of the functions are mostly irrelevant here, the compiler doesn't get to unify them, because it fails on the types of the inputs.
Also related:
Same story with eta-expansions: Why does my implementation of Haskell snd not compile in Scala

flatMap with a map in scala

Why doesn't this work:
val m = Map( 1-> 2, 2-> 4, 3 ->6)
def h(k: Int, v: Int) = if (v > 2) Some(k->v) else None
m.flatMap { case(k,v) => h(k,v) }
m.flatMap { (k,v) => h(k,v) }
The one with the case statement gives me:
res1: scala.collection.immutable.Map[Int,Int] = Map(2 -> 4, 3 -> 6)
but the other one fails and says MIssing Type parameter v, and expected: Int, actual:(Int, Int)
The case keyword signifies pattern matching, so the Tuple2 (a Mapis an Iterable ofTuple2 elements) that you are flatMapping "over" gets decomposed into k and v. (The fact that flatMap works when the h function is producing an Option rather than a Map or Iterable is the Scala collections library being perhaps overly permissive.)
Without the case keyword, you are providing a function that requires two arguments, but flatMap needs a function that accepts a single argument (a Tuple2). So the second version does not typecheck.
For second one you can do this, if you don't want to use case.
m.flatMap { x => h(x._1, x._2) } // x is (key,value) pair here(each element in map), hence accessing the key , value as _1,_2 respectively

Difference between these two function formats

I am working on spark and not an expert in scala. I have got the two variants of map function. Could you please explain the difference between them.?
first variant and known format.
first variant
val.map( (x,y) => x.size())
Second variant -> This has been applied on tuple
val.map({case (x, y) => y.toString()});
The type of val is RDD[(IntWritable, Text)]. When i tried with first function, it gave error as below.
type mismatch;
found : (org.apache.hadoop.io.IntWritable, org.apache.hadoop.io.Text) ⇒ Unit
required: ((org.apache.hadoop.io.IntWritable, org.apache.hadoop.io.Text)) ⇒ Unit
When I added extra parenthesis it said,
Tuples cannot be directly destructured in method or function parameters.
Well you say:
The type of val is RDD[(IntWritable, Text)]
so it is a tuple of arity 2 with IntWritable and Text as components.
If you say
val.map( (x,y) => x.size())
what you're doing is you are essentially passing in a Function2, a function with two arguments to the map function. This will never compile because map wants a function with one argument. What you can do is the following:
val.map((xy: (IntWritable, Text)) => xy._2.toString)
using ._2 to get the second part of the tuple which is passed in as xy (the type annotation is not required but makes it more clear).
Now the second variant (you can leave out the outer parens):
val.map { case (x, y) => y.toString() }
this is special scala syntax for creating a PartialFunction that immediately matches on the tuple that is passed in to access the x and y parts. This is possible because PartialFunction extends from the regular Function1 class (Function1[A,B] can be written as A => B) with one argument.
Hope that makes it more clear :)
I try this in repl:
scala> val l = List(("firstname", "tom"), ("secondname", "kate"))
l: List[(String, String)] = List((firstname,tom), (secondname,kate))
scala> l.map((x, y) => x.size)
<console>:9: error: missing parameter type
Note: The expected type requires a one-argument function accepting a 2-Tuple.
Consider a pattern matching anonymous function, `{ case (x, y) => ... }`
l.map((x, y) => x.size)
maybe can give you some inspire.
Your first example is a function that takes two arguments and returns a String. This is similar to this example:
scala> val f = (x:Int,y:Int) => x + y
f: (Int, Int) => Int = <function2>
You can see that the type of f is (Int,Int) => Int (just slightly changed this to be returning an int instead of a string). Meaning that this is a function that takes two Int as arguments and returns an Int as a result.
Now the second example you have is a syntactic sugar (a shortcut) for writing something like this:
scala> val g = (k: (Int, Int)) => k match { case (x: Int, y: Int) => x + y }
g: ((Int, Int)) => Int = <function1>
You see that the return type of function g is now ((Int, Int)) => Int. Can you spot the difference? The input type of g has two parentheses. This shows that g takes one argument and that argument must be a Tuple[Int,Int] (or (Int,Int) for short).
Going back to your RDD, what you have is an Collection of Tuple[IntWritable, Text] so the second function will work, whereas the first one will not work.

Unexpected Scala pattern matching syntax

I had a List of Scala tuples like the following:
val l = List((1,2),(2,3),(3,4))
and I wanted to map it in a list of Int where each item is the sum of the Ints in a the corresponding tuple. I also didn't want to use to use the x._1 notation so I solved the problem with a pattern matching like this
def addTuple(t: (Int, Int)) : Int = t match {
case (first, second) => first + second
}
var r = l map addTuple
Doing that I obtained the list r: List[Int] = List(3, 5, 7) as expected. At this point, almost by accident, I discovered that I can achieve the same result with an abbreviated form like the following:
val r = l map {case(first, second) => first + second}
I cannot find any reference to this syntax in the documentation I have. Is that normal? Am I missing something trivial?
See Section 8.5 of the language reference, "Pattern Matching Anonymous Functions".
An anonymous function can be defined by a sequence of cases
{case p1 =>b1 ... case pn => bn }
which appear as an expression without a prior match. The expected type of such an expression must in part be defined. It must be either scala.Functionk[S1, ..., Sk, R] for some k > 0, or scala.PartialFunction[S1, R], where the argument type(s) S1, ..., Sk must be fully determined, but the result type R may be undetermined.
The expected type deternines whether this is translated to a FunctionN or PartialFunction.
scala> {case x => x}
<console>:6: error: missing parameter type for expanded function ((x0$1) => x0$1 match {
case (x # _) => x
})
{case x => x}
^
scala> {case x => x}: (Int => Int)
res1: (Int) => Int = <function1>
scala> {case x => x}: PartialFunction[Int, Int]
res2: PartialFunction[Int,Int] = <function1>
{case(first, second) => first + second} is treated as a PartialFunction literal. See examples in "Partial Functions" section here: http://programming-scala.labs.oreilly.com/ch08.html or section 15.7 of Programming in Scala.
Method map accepts a function. In your first example you create a function, assign it to a variable, and pass it to the map method. In the second example you pass your created function directly, omitting assigning it to a variable. You are doing just the same thing.