What do _._1 and _++_ mean in Scala (two separate operations)? - scala

My interpretation of _._1 is:
_ = wildcard parameter
_1 = first parameter in method parameter list
But when used together with . what does it signify?
This is how its used :
.toList.sortWith(_._1 < _._1)
For this statement:
_++_
I'm lost. Is it concatenation two wildcard parameters somehow?
This is how its used:
.reduce(_++_)
I would be particularly interested if they above code could be made more verbose and remove any implicits, just so I can understand it better?

_._1 calls the method _1 on the wildcard parameter _, which gets the first element of a tuple. Thus, sortWith(_._1 < _._1) sorts the list of tuple by their first element.
_++_ calls the method ++ on the first wildcard parameter with the second parameter as an argument. ++ does concatenation for sequences. Thus .reduce(_++_) concatenates a list of sequences together. Usually you can use flatten for that.

_1 is a method name. Specifically tuples have a method named _1, which returns the first element of the tuple. So _._1 < _._1 means "call the _1 method on both arguments and check whether the first is less than the second".
And yes, _++_ concatenates both arguments (assuming the first argument has a ++ method that performs concatenation).

.reduce(_++_)
is really just:
.reduce{ (acc, n) => acc ++ n }

Related

Folding lists in scala

Folding list in scala using /: and :\ operator
I tried to to look at different sites and they only talk about foldRight and foldLeft functions.
def sum(xs: List[Int]): Int = (0 /: xs) (_ + _)
sum(List(1,2,3))
res0: 6
The code segment works as described. But I am not able to completely understand the method definition. What I understand is that the one inside the first parenthesis -> 0 /: xs where /: is a right associate operator. The object is xs and the parameter is 0. I am not sure about the return type of the operation (most probably it would be another list?). The second part is a functional piece which sums its two parameters. But I don't understand what object invokes it ? and the name of function. Can someone please help me to understand.
The signature of :/ is
/:[B](z: B)(op: (B, A) ⇒ B): B
It is a method with multiple argument lists, so when it is just invoked with on argument (i.e. 0 /: xs in your case) the return type is (op: (B, A) ⇒ B): B. So you have to pass it a method with 2 parameters ( _ + _ ) that is used to combine the elements of the list starting from z.
This method is usually called foldLeft:
(0 /: xs)(_ + _) is the same as xs.foldLeft(0)(_ + _)
You can find more details here: https://www.scala-lang.org/api/2.12.3/scala/collection/immutable/List.html
Thanks #HaraldGliebe & #LuisMiguelMejíaSuárez for your great responses. I am enlightened now!. I am just summarisig the answer here which may benefit others who read this thread.
"/:" is actually the name of the function which is defined inside the List class. The signature of the function is: /:[B](z: B)(op: (B, A) ⇒ B): B --> where B is the type parameter, z is the first parameter; op is the second parameter which is of functional type.
The function follows curried version --> which means we can pass less number of parameters than that of the actual number. If we do that,
the partially applied function is stored in a temporary variable; we can then use the temporary variable to pass the remaining parameters.
If supplied with all parameters, "/:" can be called as: x./:(0)(_+_) where x is val/var of List type. OR "/:" can be called in two steps which are given as:
step:1 val temp = x./:(0)(_) where we pass only the first parameter. This results in a partially applied function which is stored in the temp variable.
step:2 temp(_+_) here using the partially applied function temp is passed with the second (final) parameter.
If we decide to follow the first style ( x./:(0)(_+_) ), calling the first parameter can be written in operator notion which is: x /: 0
Since the method name ends with a colon, the object will be pulled from right side. So x /: 0 is invalid and it has to be written as 0 /: x which is correct.
This one is equivalent to the temp variable. On following 0 /: x, second parameter also needs to be passed. So the whole construct becomes: (0/:x)(_+_)
This is how the definition of the function sum in the question, is interpreted.
We have to note that when we use curried version of the function in operator notion, we have to supply all the parameters in a single go.
That is: (0 /: x) (_) OR (0 /: x) _ seems throwing syntax errors.

Spark Group By Key to (Key,List) Pair

I am trying to group some data by key where the value would be a list:
Sample data:
A 1
A 2
B 1
B 2
Expected result:
(A,(1,2))
(B,(1,2))
I am able to do this with the following code:
data.groupByKey().mapValues(List(_))
The problem is that when I then try to do a Map operation like the following:
groupedData.map((k,v) => (k,v(0)))
It tells me I have the wrong number of parameters.
If I try:
groupedData.map(s => (s(0),s(1)))
It tells me that "(Any,List(Iterable(Any)) does not take parameters"
No clue what I am doing wrong. Is my grouping wrong? What would be a better way to do this?
Scala answers only please. Thanks!!
You're almost there. Just replace List(_) with _.toList
data.groupByKey.mapValues(_.toList)
When you write an anonymous inline function of the form
ARGS => OPERATION
the entire part before the arrow (=>) is taken as the argument list. So, in the case of
(k, v) => ...
the interpreter takes that to mean a function that takes two arguments. In your case, however, you have a single argument which happens to be a tuple (here, a Tuple2, or a Pair - more fully, you appear to have a list of Pair[Any,List[Any]]). There are a couple of ways to get around this. First, you can use the sugared form of representing a pair, wrapped in an extra set of parentheses to show that this is the single expected argument for the function:
((x, y)) => ...
or, you can write the anonymous function in the form of a partial function that matches on tuples:
groupedData.map( case (k,v) => (k,v(0)) )
Finally, you can simply go with a single specified argument, as per your last attempt, but - realising it is a tuple - reference the specific field(s) within the tuple that you need:
groupedData.map(s => (s._2(0),s._2(1))) // The key is s._1, and the value list is s._2

Apache-Spark : What is map(_._2) shorthand for?

I read a project's source code, found:
val sampleMBR = inputMBR.map(_._2).sample
inputMBR is a tuple.
the function map's definition is :
map[U classTag](f:T=>U):RDD[U]
it seems that map(_._2) is the shorthand for map(x => (x._2)).
Anyone can tell me rules of those shorthand ?
The _ syntax can be a bit confusing. When _ is used on its own it represents an argument in the anonymous function. So if we working on pairs:
map(_._2 + _._2) would be shorthand for map(x, y => x._2 + y._2). When _ is used as part of a function name (or value name) it has no special meaning. In this case x._2 returns the second element of a tuple (assuming x is a tuple).
collection.map(_._2) emits a second component of the tuple. Example from pure Scala (Spark RDDs work the same way):
scala> val zipped = (1 to 10).zip('a' to 'j')
zipped: scala.collection.immutable.IndexedSeq[(Int, Char)] = Vector((1,a), (2,b), (3,c), (4,d), (5,e), (6,f), (7,g), (8,h), (9,i), (10,j))
scala> val justLetters = zipped.map(_._2)
justLetters: scala.collection.immutable.IndexedSeq[Char] = Vector(a, b, c, d, e, f, g, h, i, j)
Two underscores in '_._2' are different.
First '_' is for placeholder of anonymous function; Second '_2' is member of case class Tuple.
Something like:
case class Tuple3 (_1: T1, _2: T2, _3: T3)
{...}
I have found the solutions.
First the underscore here is as placeholder.
To make a function literal even more concise, you can use underscores
as placeholders for one or more parameters, so long as each parameter
appears only one time within the function literal.
See more about underscore in Scala at What are all the uses of an underscore in Scala?.
The first '_' is referring what is mapped to and since what is mapped to is a tuple you might call any function within the tuple and one of the method is '_2' so what below tells us transform input into it's second attribute.

Scala collections: why do we need a case statement to extract values tuples in higher order functions?

Related to Tuple Unpacking in Map Operations, I don't understand why do we need a case (that looks like a partial function to me) to extract values from tuple, like that:
arrayOfTuples map {case (e1, e2) => e1.toString + e2}
Instead of extracting in the same way it works in foldLeft, for example
def sum(list: List[Int]): Int = list.foldLeft(0)((r,c) => r+c)
Anyway we don't specify the type of parameters in the first case, so why do we need the case statement?
Because in Scala function argument lists and tuples are not a unified concept as they are in Haskell and other functional languages. So a function:
(t: (Int, Int)) => ...
is not the same thing as a function:
(e1: Int, e2: Int) => ...
In the first case you can use pattern matching to extract the tuple elements, and that's always done using case syntax. Actually, the expression:
{case (e1, e2) => ...}
is shorthand for:
t => t match {case (e1, e2) => ...}
There has been some discussions about unifying tuples and function argument lists, but there are complications regarding Java overloading rules, and also default/named arguments. So, I think it's unlikely the concepts will ever be unified in Scala.
Lambda with one primitive parameter
With
var listOfInt=(1 to 100).toList
listOfInt.foldRight(0)((current,acc)=>current+acc)
you have a lambda function operating on two parameter.
Lambda with one parameter of type tuple
With
var listOfTuple=List((1,"a"),(2,"b"),(3," "))
listOfTuple.map(x => x._1.toString + x._2.toString)
you have a lambda function working on one parameter (of type Tuple2[Int, String])
Both works fine with type inference.
Partial lambda with one parameter
With
listOfTuple.map{case (x,y) => x.toString + y.toString}
you have a lambda function, working with one parameter (of type Tuple2[Int, String]). This lambda function then uses Tuple2.unapply internally to decompose the one parameter in multiple values. This still works fine with type inference. The case is needed for the decomposition ("pattern matching") of the value.
This example is a little bit unintuitive, because unapply returns a Tuple as its result. In this special case there might indeed be a trick, so Scala uses the provided tuple directly. But I am not really aware of such a trick.
Update: Lambda function with currying
Indeed there is a trick. With
import Function.tupled
listOfTuple map tupled{(x,y) => x.toString + y.toString}
you can directly work with the tuple. But of course this is really a trick: You provide a function operating on two parameters and not with a tuple. tupled then takes that function and changes it to a different function, operating on a tuple. This technique is also called uncurrying.
Remark:
The y.toString is superfluous when y is already a string. This is not considered good style. I leave it in for the sake of the example. You should omit it in real code.

Adding an (Int, Int) tuple to a Set in scala [duplicate]

Are the parenthesis around the final tuple really needed? It doesn't compile without them and the compiler tries to add only the Sort("time") and complains that it expects a tuple instead.
val maxSortCounts: Map[Sort, Int] =
sorts.map(s => s -> usedPredicates.map(pred => pred.signature.count(_ == s)).max)
.toMap + ((Sort("time"), 1))
I've tried to reproduce this behaviour inside the REPL with a shorter example, but there it behaves as intended. The variable sorts is a Seq[Sort].
error: type mismatch;
found : <snip>.Sort
required: (<snip>.Sort, Int)
.toMap + (Sort("time"), 1)
Yes, they are needed. Otherwise the compiler will interpret the code as
x.+(y, z) instead of x.+((y, z)).
Instead, you can use ArrowAssoc again: x + (y -> z). Notice, the parentheses are also needed because + and - have the same precedence (only the first sign of a method defines its precedence).
Yes, they're needed. They make the expression a tuple. Parentheses surrounding a comma-separated list create tuple objects. For example, (1, 2, 3) is a 3-tuple of numbers.
Map's + method accepts a pair - in other words a tuple of two elements. Map represents entries in the map as (key,value) tuples.