x is a Dataset[Long] with a single column, it was created using SparkSession.range.
A single reduce operation on x using the anonymous _+_ addition functions should return me a Long value.
But instead I get the following error:
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.0
/_/
Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 11.0.11)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark
res0: org.apache.spark.sql.SparkSession = org.apache.spark.sql.SparkSession#a90447f
scala> val x = spark.range(0, 10000000, 10)
x: org.apache.spark.sql.Dataset[Long] = [id: bigint]
scala> x.reduce(_+_)
<console>:26: error: overloaded method value reduce with alternatives:
(func: org.apache.spark.api.java.function.ReduceFunction[java.lang.Long])java.lang.Long <and>
(func: (java.lang.Long, java.lang.Long) => java.lang.Long)java.lang.Long
cannot be applied to ((java.lang.Long, java.lang.Long) => scala.Long)
x.reduce(_+_)
Even after writing a function with well-defined types do we get this:
scala> def add(a:Long, b:Long):Long = a+b
add: (a: Long, b: Long)Long
scala> x reduce (add(_,_))
<console>:28: error: overloaded method value reduce with alternatives:
(func: org.apache.spark.api.java.function.ReduceFunction[java.lang.Long])java.lang.Long <and>
(func: (java.lang.Long, java.lang.Long) => java.lang.Long)java.lang.Long
cannot be applied to ((java.lang.Long, java.lang.Long) => scala.Long)
x reduce (add(_,_))
But if I write a aggregating function explicitly using java.lang.Long for both the parameters, and the return type. Only then does it work.
scala> def add(a:java.lang.Long, b:java.lang.Long):java.lang.Long = a+b
add: (a: Long, b: Long)Long
scala> x.reduce(add(_,_))
res10: Long = 4999995000000
I don't think this is an issue everyone must be facing, do we really have to use java.lang.Long everywhere while using Long in Spark with Scala?
There has to be a better method, this is too long.
reduce() function will take either scala.Long or java.lang.Long and return corresponding types.
spark.range() will return java.lang.Long, you can change that to scala.Long and call reduce() or convert reduce output to java.lang.Long using Long.box().
scala> spark.range(0, 10000000, 10).reduce((a,b) => Long.box(a + b))
res4: Long = 4999995000000
scala> spark.range(0, 10000000, 10).as[Long].reduce(_+_)
res5: Long = 4999995000000
Related
When I create a partial function, why can't I immediately invoke it?
Both res6 and res8 are the same type (function1) so I'm not sure how come res7 works (immediately invoking it) and what would be res9 fails
scala> ((x: Int) => x + 1)
res6: Int => Int = <function1>
scala> ((x: Int) => x + 1)(1)
res7: Int = 2
scala> def adder(a: Int, b: Int) = a + b
adder: (a: Int, b: Int)Int
scala> adder(1, _: Int)
res8: Int => Int = <function1>
scala> adder(1, _: Int)(1)
<console>:12: error: Int does not take parameters
adder(1, _: Int)(1)
^
scala> (adder(1, _: Int))(1)
res10: Int = 2
I think you've just found one of the little Scala compiler quirks.
I don't know how this case is implemented exactly in the parser, but from the looks of it Scala thinks you are invoking a function with multiple parameter lists (of the form f(...)(...)). Thus you need to explicitly surround the partially applied function with brackets here, so that compiler could disambiguate between f()() and f(_)() forms. In res8 you don't have any parameters following the function, so there is no ambiguity. Otherwise, Scala would require you to follow the function with underscore if you ommit parameter list: f _ or f() _.
You can still immediately apply that function as you show in res10.
adder(1, _: Int)
This is caused by scala compiler will expand it to:
((x: Int) => adder(1, x)(1))
and this is caused by the compiler can't infer the wildcard's context, like:
_ + _
scala> _ + _
<console>:17: error: missing parameter type for expanded function ((x$1, x$2) => x$1.$plus(x$2))
_ + _
^
<console>:17: error: missing parameter type for expanded function ((x$1: <error>, x$2) => x$1.$plus(x$2))
_ + _
so you can do it with context bounds, as your way:
(adder(1, _: Int))(1)
it will be expanded like:
((x: Int) => adder(1, x))(1)
I'm trying to write some generic testing software for various types using a parameterized base class. I get match errors though on code which I don't believe should be possible.
abstract class AbstractTypeTest[TestType: ClassTag, DriverType <: AnyRef : ClassTag](
def checkNormalRowConsistency(
expectedKeys: Seq[TestType],
results: Seq[(TestType, TestType, TestType)]) {
val foundKeys = results.filter {
case (pkey: TestType, ckey: TestType, data: TestType) => expectedKeys.contains(pkey)
case x => throw new RuntimeException(s"${x.getClass}")
}
foundKeys should contain theSameElementsAs expectedKeys.map(x => (x, x, x))
}
}
An extending class specifies the parameter of TypeTest but this code block throws Runtime Exceptions (because we don't match what I believe should be the only allowed type here)
The output of the above code with some extended types works but with others it does not, in this case I'm extending this class with TestType --> Int
[info] java.lang.RuntimeException: class scala.Tuple3
I can remove the type on the filter and the ScalaTest check works perfectly but I'd like to understand why the MatchError occurs.
If you're on Scala 2.10, then
ClassTag based pattern matching fails for primitives
scala> import reflect._
import reflect._
scala> def f[A: ClassTag](as: Seq[(A,A,A)]) = as filter {
| case (x: A, y: A, z: A) => true ; case _ => false }
f: [A](as: Seq[(A, A, A)])(implicit evidence$1: scala.reflect.ClassTag[A])Seq[(A, A, A)]
scala> val vs = List((1,2,3),(4,5,6))
vs: List[(Int, Int, Int)] = List((1,2,3), (4,5,6))
scala> f(vs)
res0: Seq[(Int, Int, Int)] = List()
versus 2.11
scala> f[Int](vs)
res4: Seq[(Int, Int, Int)] = List((1,2,3), (4,5,6))
Update (2018): my prayers were answered in Dotty (Type Lambdas), so the following Q&A is more "Scala 2.x"-related
Just a simple example from Scala:
scala> def f(x: Int) = x
f: (x: Int)Int
scala> (f _)(5)
res0: Int = 5
Let's make it generic:
scala> def f[T](x: T) = x
f: [T](x: T)T
scala> (f _)(5)
<console>:9: error: type mismatch;
found : Int(5)
required: Nothing
(f _)(5)
^
Let's look at eta-expansion of polymorphic method in Scala:
scala> f _
res2: Nothing => Nothing = <function1>
Comparison with Haskell:
Prelude> let f x = x
Prelude> f 5
5
Prelude> f "a"
"a"
Prelude> :t f
f :: t -> t
Haskell did infer correct type [T] => [T] here.
More realistic example?
scala> identity _
res2: Nothing => Nothing = <function1>
Even more realistic:
scala> def f[T](l: List[T]) = l.head
f: [T](l: List[T])T
scala> f _
res3: List[Nothing] => Nothing = <function1>
You can't make alias for identity - have to write your own function. Things like [T,U](t: T, u: U) => t -> u (make tuple) are impossible to use as values. More general - if you want to pass some lambda that rely on generic type (e.g. uses generic function, for example: creates lists, tuples, modify them in some way) - you can't do that.
So, how to solve that problem? Any workaround, solution or reasoning?
P.S. I've used term polymorphic lambda (instead of function) as function is just named lambda
Only methods can be generic on the JVM/Scala, not values. You can make an anonymous instance that implements some interface (and duplicate it for every type-arity you want to work with):
trait ~>[A[_], B[_]] { //exists in scalaz
def apply[T](a: A[T]): B[T]
}
val f = new (List ~> Id) {
def apply[T](a: List[T]) = a.head
}
Or use shapeless' Poly, which supports more complicated type-cases. But yeah, it's a limitation and it requires working around.
P∀scal is a compiler plugin that provides more concise syntax for encoding polymorphic values as objects with a generic method.
The identity function, as a value, has type ∀A. A => A. To translate that into Scala, assume a trait
trait ForAll[F[_]] {
def apply[A]: F[A]
}
Then the identity function has type ForAll[λ[A => A => A]], where I use the kind-projector syntax, or, without kind-projector:
type IdFun[A] = A => A
type PolyId = ForAll[IdFun]
And now comes the P∀scal syntactic sugar:
val id = Λ[Α](a => a) : PolyId
or equivalently
val id = ν[PolyId](a => a)
("ν" is the Greek lowercase letter "Nu", read "new")
These are really just shorthands for
new PolyId {
def apply[A] = a => a
}
Multiple type parameters and parameters of arbitrary kinds are supported by P∀scal, but you need a dedicated variation on the above ForAll trait for each variant.
I really like #Travis Brown 's solution:
import shapeless._
scala> Poly(identity _)
res2: shapeless.PolyDefns.~>[shapeless.Id,shapeless.Id] = fresh$macro$1$2$#797aa352
-
scala> def f[T](x: T) = x
f: [T](x: T)T
scala> Poly(f _)
res3: shapeless.PolyDefns.~>[shapeless.Id,shapeless.Id] = fresh$macro$2$2$#664ea816
-
scala> def f[T](l: List[T]) = l.head
f: [T](l: List[T])T
scala> val ff = Poly(f _)
ff: shapeless.PolyDefns.~>[List,shapeless.Id] = fresh$macro$3$2$#51254c50
scala> ff(List(1,2,3))
res5: shapeless.Id[Int] = 1
scala> ff(List("1","2","3"))
res6: shapeless.Id[String] = 1
Poly constructor (in some cases) will give you eta-expansion into Shapeless2 Poly1 function, which is (more-less) truly generic. However it doesn't work for multi-parameters (even with multi type-parameters), so have to "implement" Poly2 with implicit + at approach (as #som-snytt suggested), something like:
object myF extends Poly2 {
implicit def caseA[T, U] = at[T, U]{ (a, b) => a -> b}
}
scala> myF(1,2)
res15: (Int, Int) = (1,2)
scala> myF("a",2)
res16: (String, Int) = (a,2)
P.S. I would really want to see it as a part of language.
It seems to do this you will need to do a bit type hinting to help the Scala type inference system.
def id[T] : T => T = identity _
So I guess if you try to pass identity as a parameter to a function call and the types of that parameter are generic then there should be no problem.
I want to write a short functional sum-function for a List of BigDecimal and tried with:
def sum(xs: List[BigDecimal]): BigDecimal = (0 /: xs) (_ + _)
But I got this error message:
<console>:7: error: overloaded method value + with alternatives:
(x: Int)Int <and>
(x: Char)Int <and>
(x: Short)Int <and>
(x: Byte)Int
cannot be applied to (BigDecimal)
def sum(xs: List[BigDecimal]): BigDecimal = (0 /: xs) (_ + _)
^
If I use Int instead, that function works. I guess this is because BigDecimal's operator overloading of +. What is a good workaround for BigDecimal?
The problem is in inital value. The solution is here and is quite simple:
sum(xs: List[BigDecimal]): BigDecimal = (BigDecimal(0) /: xs) (_ + _)
foldLeft requires an initialization value.
def foldLeft[B](z: B)(f: (B, A) ⇒ B): B
This initialization value (named z) has to be of the same type as the type to fold over:
(BigDecimal(0) /: xs) { (sum: BigDecimal, x: BigDecimal) => sum+x }
// with syntax sugar
(BigDecimal(0) /: xs) { _+_ }
If you add an Int as initialization value the foldLeft will look like:
(0 /: xs) { (sum: Int, x: BigDecimal) => sum+x } // error: not possible to add a BigDecimal to Int
In a situation like this (where the accumulator has the same type as the items in the list) you can start the fold by adding the first and second items in the list—i.e., you don't necessarily need a starting value. Scala's reduce provides this kind of fold:
def sum(xs: List[BigDecimal]) = xs.reduce(_ + _)
There are also reduceLeft and reduceRight versions if your operation isn't associative.
As others have already said, you got an error because of initial value, so correct way is to wrap it in BigDecimal. In addition, if you have number of such functions and don't want to write BigDecimal(value) everywhere, you can create implicit convert function like this:
implicit def intToBigDecimal(value: Int) = BigDecimal(value)
and next time Scala will silently convert all your Ints (including constants) to BigDecimal. In fact, most programming languages use silent conversions from integers to decimal or even from decimals to fractions (e.g. Lisps), so it seems to be very logical move.
I'm trying to dynamically filter (or collect) a list based on type:
If I do this specifying the type explicitly, it works fine
scala> var aList = List("one", 2, 3.3)
aList: List[Any] = List(one, 2, 3.3)
scala> aList.collect{case x:Int => x}
res10: List[Int] = List(2)
If I want to write a method to do this generically, then it doesn't:
scala> def collectType[T](l:List[Any]):List[T] = l.collect{case x:T => x}
warning: there were unchecked warnings; re-run with -unchecked for details
collectType: [T](l: List[Any])List[T]
scala> collectType[Int](aList)
res11: List[Int] = List(one, 2, 3.3)
scala> collectType[Double](aList)
res16: List[Double] = List(one, 2, 3.3)
scala> collectType[String](aList)
res14: List[String] = List(one, 2, 3.3)
I thought at first that it was naming the type 'Integer' rather than using Integer as the type, but that doesn't seem to be the case as:
collectType[Int](aList).foreach(x => println(x))
java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
It's as though it's deferring checking the type until it's forced to
What am I missing about Types?
Is there a way to achieve what I want to achieve?
After reading through the linked questions, this is what I've come up with. Pretty simple now that it's been pointed out. Taggable is a trait which knows how to hold a map of tags for a class
def matches[F <: Taggable](thing:Taggable)(implicit m:Manifest[F]):Boolean = {
thing match {
case e if (m >:> singleType(e)) => true
case x => false
}
}
def findByType[G <: Taggable](list:List[Taggable])(implicit m:Manifest[G]) = {
list.collect{case x if (matches[G](x)) => x}
}
You are missing type erasure. At runtime your method is actually
def collectType(l:List):List = l.collect {case x:Object => x}