Adding an (Int, Int) tuple to a Set in scala [duplicate] - scala

Are the parenthesis around the final tuple really needed? It doesn't compile without them and the compiler tries to add only the Sort("time") and complains that it expects a tuple instead.
val maxSortCounts: Map[Sort, Int] =
sorts.map(s => s -> usedPredicates.map(pred => pred.signature.count(_ == s)).max)
.toMap + ((Sort("time"), 1))
I've tried to reproduce this behaviour inside the REPL with a shorter example, but there it behaves as intended. The variable sorts is a Seq[Sort].
error: type mismatch;
found : <snip>.Sort
required: (<snip>.Sort, Int)
.toMap + (Sort("time"), 1)

Yes, they are needed. Otherwise the compiler will interpret the code as
x.+(y, z) instead of x.+((y, z)).
Instead, you can use ArrowAssoc again: x + (y -> z). Notice, the parentheses are also needed because + and - have the same precedence (only the first sign of a method defines its precedence).

Yes, they're needed. They make the expression a tuple. Parentheses surrounding a comma-separated list create tuple objects. For example, (1, 2, 3) is a 3-tuple of numbers.
Map's + method accepts a pair - in other words a tuple of two elements. Map represents entries in the map as (key,value) tuples.

Related

Execute functions out of a Tuple in Scala

I have a Tuple where I have stored anonymous functions, I want to iterate through them and execute them.
val functions = ((x:Int, y:Int) => x + y, (x:Int, y: Int) => x - y)
// I want to execute the anonymous functions in the Tuple
functions.productIterator.foreach(function => function)
Unfortunately I am not able to do
functions.productIterator.foreach(function => function(1, 2))
OR
functions.productIterator.foreach(_(1, 2))
what is the way out.
Tuples are not meant to be iterated over. The types get lost because each entry in a tuple is able to be a different type and so the type system just assumes Any (thus the Iterator[Any]). So the real suggestion is that if you want to iterate, use a collection like a Seq or Set.
On the other hand, if you know that the tuple contains functions of a particular type, then you can bypass the type checking by casting with asInstanceOf, but this is not recommended because type checking is your friend.
functions.productIterator.map(_.asInstanceOf[(Int,Int)=>Int](1, 2))
// produces `Iterator(3, -1)`
Alternatively, have a look at HLists in Shapeless, which have properties of both tuples and collections.
productIterator on a Tuple returns Iterator[Any] and not Iterator[Function2[Int, Int, Int]] as you're expecting.
We can extract the elements of the tuple into a Seq while preserving type information; hence
Seq(functions._1, functions._2).map(_(1,2))
List(3, -1)

Apache-Spark : What is map(_._2) shorthand for?

I read a project's source code, found:
val sampleMBR = inputMBR.map(_._2).sample
inputMBR is a tuple.
the function map's definition is :
map[U classTag](f:T=>U):RDD[U]
it seems that map(_._2) is the shorthand for map(x => (x._2)).
Anyone can tell me rules of those shorthand ?
The _ syntax can be a bit confusing. When _ is used on its own it represents an argument in the anonymous function. So if we working on pairs:
map(_._2 + _._2) would be shorthand for map(x, y => x._2 + y._2). When _ is used as part of a function name (or value name) it has no special meaning. In this case x._2 returns the second element of a tuple (assuming x is a tuple).
collection.map(_._2) emits a second component of the tuple. Example from pure Scala (Spark RDDs work the same way):
scala> val zipped = (1 to 10).zip('a' to 'j')
zipped: scala.collection.immutable.IndexedSeq[(Int, Char)] = Vector((1,a), (2,b), (3,c), (4,d), (5,e), (6,f), (7,g), (8,h), (9,i), (10,j))
scala> val justLetters = zipped.map(_._2)
justLetters: scala.collection.immutable.IndexedSeq[Char] = Vector(a, b, c, d, e, f, g, h, i, j)
Two underscores in '_._2' are different.
First '_' is for placeholder of anonymous function; Second '_2' is member of case class Tuple.
Something like:
case class Tuple3 (_1: T1, _2: T2, _3: T3)
{...}
I have found the solutions.
First the underscore here is as placeholder.
To make a function literal even more concise, you can use underscores
as placeholders for one or more parameters, so long as each parameter
appears only one time within the function literal.
See more about underscore in Scala at What are all the uses of an underscore in Scala?.
The first '_' is referring what is mapped to and since what is mapped to is a tuple you might call any function within the tuple and one of the method is '_2' so what below tells us transform input into it's second attribute.

Iteration over Scala tuples [duplicate]

This question already has answers here:
Iterate Over a tuple
(4 answers)
Closed 8 years ago.
In scala, we can get an iterator over a tuple as follows
val t = (1, 2)
val it = t.productIterator
and even
it.foreach( x => println(x.isInstanceOf[Int]) )
returns true, we cannot do simple operations on the iterator values without using asInstanceOf[Int], since
it.foreach( x => println(x+1) )
returns an error: type mismatch; found : Int(1) required: String
I understand the issue with Integer vs. Int, but still the validity of isInstanceOf[Int] is somewhat confusing.
What is the best way to do these operations over tuples? Notice that the tuple can have a mix of types like integers with doubles, so converting to a list might not always work.
A tuple does not have to be homogenous and the compiler didn't try to apply magic type unification across the elements1. Take (1, "hello") as an example of such a a heterogenous tuple (Tuple2[Int,String]).
This means that x is typed as Any (not Int!). Try it.foreach( (x: Int) => println(x) ), with the original tuple, to get a better error message indicating that the iterator is not unified over the types of the tuple elements (it is an Iterators[Any]). The error reported should be similar to:
error: type mismatch;
found : (Int) => Unit
required: (Any) => ?
(1, 2).productIterator.foreach( (x: Int) => println(x) )
In this particular case isInstanceOf[Int] can be used to refine the type - from the Any that the type-system gave us - because we know, from manual code inspection, that it will "be safe" with the given tuple.
Here is another look at the iterators/types involved:
(1, 2) // -> Tuple2[Int,Int]
.productIterator // -> Iterator[Any]
.map(_.asInstanceOf[Int]) // -> Iterator[Int]
.foreach(x => println(x+1))
While I would recommend treating tuples as finite sets of homogenous elements and not a sequence, the same rules can be used as when dealing with any Iterator[Any] such as using pattern matching (e.g. match) that discriminates by the actual object type. (In this case the code is using an implicit PartialFunction.)
(1, "hello").productIterator
.foreach {
case s: String => println("string: " + s)
case i: Int => println("int: " + i)
}
1 While it might be possible to make the compiler unify the types in this scenario, it sounds like a special case that requires extra work for minimal gain. Normally sequences like lists - not tuples - are used for homogenous elements and the compiler/type-system correctly gives us a good refinement for something like List(1,2) (which is typed as List[Int] as expected).
There is another type HList, that is like tuple and List all-in-one. See shapeless.
I think, you can get close to what you want:
import shapeless._
val t = 1 :: 2 :: HNil
val lst = t.toList
lst.foreach( x => println(x+1) )

Why doesn't the scala compiler recognize this as a tuple?

If I create a map:
val m = Map((4, 3))
And try to add a new key value pair:
val m_prime = m + (1, 5)
I get:
error: type mismatch;
found : Int(1)
required: (Int, ?)
val m_prime = m + (1, 5)
If I do:
val m_prime = m + ((1, 5))
Or:
val m_prime = m + (1 -> 5)
Then it works. Why doesn't the compiler accept the first example?
I am using 2.10.2
This is indeed very annoying (I run into this frequently). First of all, the + method comes from a general collection trait, taking only one argument—the collection's element type. Map's element type is the pair (A, B). However, Scala interprets the parentheses here as method call parentheses, not a tuple constructor. The explanation is shown in the next section.
To solve this, you can either avoid tuple syntax and use the arrow association key -> value instead, or use double parentheses, or use method updated which is specific to Map. updated does the same as + but takes key and value as separate arguments:
val m_prime = m updated (1, 5)
Still it is unclear why Scala fails here, as in general infix syntax should work and not expect parentheses. It appears that this particular case is broken because of a method overloading: There is a second + method that takes a variable number of tuple arguments.
Demonstration:
trait Foo {
def +(tup: (Int, Int)): Foo
}
def test1(f: Foo) = f + (1, 2) // yes, it works!
trait Baz extends Foo {
def +(tups: (Int, Int)*): Foo // overloaded
}
def test2(b: Baz) = b + (1, 2) // boom. we broke it.
My interpretation is that with the vararg version added, there is now an ambiguity: Is (a, b) a Tuple2 or a list of two arguments a and b (even if a and b are not of type Tuple2, perhaps the compiler would start looking for an implicit conversion). The only way to resolve the ambiguity is to use either of the three approaches described above.

What do _._1 and _++_ mean in Scala (two separate operations)?

My interpretation of _._1 is:
_ = wildcard parameter
_1 = first parameter in method parameter list
But when used together with . what does it signify?
This is how its used :
.toList.sortWith(_._1 < _._1)
For this statement:
_++_
I'm lost. Is it concatenation two wildcard parameters somehow?
This is how its used:
.reduce(_++_)
I would be particularly interested if they above code could be made more verbose and remove any implicits, just so I can understand it better?
_._1 calls the method _1 on the wildcard parameter _, which gets the first element of a tuple. Thus, sortWith(_._1 < _._1) sorts the list of tuple by their first element.
_++_ calls the method ++ on the first wildcard parameter with the second parameter as an argument. ++ does concatenation for sequences. Thus .reduce(_++_) concatenates a list of sequences together. Usually you can use flatten for that.
_1 is a method name. Specifically tuples have a method named _1, which returns the first element of the tuple. So _._1 < _._1 means "call the _1 method on both arguments and check whether the first is less than the second".
And yes, _++_ concatenates both arguments (assuming the first argument has a ++ method that performs concatenation).
.reduce(_++_)
is really just:
.reduce{ (acc, n) => acc ++ n }