Consider these overloaded groupBy signatures:
def groupBy[K](f: T => K)(implicit kt: ClassTag[K]): RDD[(K, Iterable[T])] = withScope {
groupBy[K](f, defaultPartitioner(this))
}
def groupBy[K](
f: T => K,
numPartitions: Int)(implicit kt: ClassTag[K]): RDD[(K, Iterable[T])] = withScope {
groupBy(f, new HashPartitioner(numPartitions))
}
A correct/working invocation of the former is as follows:
val groupedRdd = df.rdd.groupBy{ r => r.getString(r.fieldIndex("centroidId"))}
But I am unable to determine how to add the second parameter. Here is the obvious attempt - which gives syntax errors:
val groupedRdd = df.rdd.groupBy{ r => r.getString(r.fieldIndex("centroidId")),
nPartitions}
I had also tried (also with syntax errors) :
val groupedRdd = df.rdd.groupBy({ r => r.getString(r.fieldIndex("centroidId"))},
nPartitions)
btw Here is an approach that does work .. but I am looking for the inline syntax
def func(r: Row) = r.getString(r.fieldIndex("centroidId"))
val groupedRdd = df.rdd.groupBy( func _, nPartitions)
Since this is a generic method with type parameters T, K, Scala sometimes can't infer what types those should be from the context. In such cases you can help it by providing type annotation like this:
df.rdd.groupBy({ r: Row => r.getString(r.fieldIndex("centroidId")) }, nPartitions)
This is also the reason why this approach works:
def func(r: Row) = r.getString(r.fieldIndex("centroidId"))
val groupedRdd = df.rdd.groupBy(func _, nPartitions)
This fixes the type for r to be a Row similarly to the approach above.
Related
I am working in a project that use the following code:
case class R(f: Vector[String], s: Vector[String]) {
def apply(name: String): String = f(schema indexOf name)
def apply(names: S): Vector[String] = names map (this apply _)
}
def processCSV(file: String)(yld: R => Unit): Unit = {
val in = new Scanner(file)
val s= in.next('\n').split(",").toVector
while (in.hasNext) {
val f = schema map (n => in.next(if (n == s.last) '\n' else ','))
yld(R(f, s))
}
}
def execOp(op: Operator)(yld: R => Unit): Unit = op match {
case Scan(file, _, _, _) => processCSV(file)(yld)}
Then My question is what is the meaning of yld? That is the same of yield?
Exactly how works, someone can help me to understand how works this yld?
yield is a scala keyword used with for-comprehensions.
yld in that code is VERY different: it is just the name that the author of the code gave one of the parameters to the functions processCSV and execOp. Any other name could have been given to those parameters: fn, callback, cb, etc. Nothing special there. Given the type R => Unit, it is just a function that takes R as input and return Unit (equivalent to void in java). Essentially, a callback where the work happens as side-effects.
I have a generic map with values, some of which can be in turn lists of values.
I'm trying to process a given key and convert the results to the type expected by an outside caller, like this:
// A map with some values being other collections.
val map: Map[String, Any] = Map("foo" -> 1, "bar" -> Seq('a', 'b'. 'a'))
// A generic method with a "specialization" for collections (pseudocode)
def cast[T](key: String) = map.get(key).map(_.asInstanceOf[T])
def cast[C <: Iterable[T]](key: String) = map.get(key).map(list => list.to[C].map(_.asIntanceOf[T]))
// Expected usage
cast[Int]("foo") // Should return 1:Int
cast[Set[Char]]("bar") // Should return Set[Char]('a', 'b')
This is to show what I would like to do, but it does not work. The compiler error complains (correctly, about 2 possible matches). I've also tried to make this a single function with some sort of pattern match on the type to no avail.
I've been reading on #specialized, TypeTag, CanBuildFrom and other scala functionality, but I failed to find a simple way to put it all together. Separate examples I've found address different pieces and some ugly workarounds, but nothing that would simply allow an external user to call cast and get an exception is the cast was invalid. Some stuff is also old, I'm using Scala 2.10.5.
This appears to work but it has a some problems.
def cast[T](m: Map[String, Any], k: String):T = m(k) match {
case x: T => x
}
With the right input you get the correct output.
scala> cast[Int](map,"foo")
res18: Int = 1
scala> cast[Set[Char]](map,"bar")
res19: Set[Char] = Set(a, b)
But it throws if the type is wrong for the key or if the map has no such key (of course).
You can do this via implicit parameters:
val map: Map[String, Any] = Map("foo" -> 1, "bar" -> Set('a', 'b'))
abstract class Casts[B] {def cast(a: Any): B}
implicit val doubleCast = new Casts[Double] {
override def cast(a: Any): Double = a match {
case x: Int => x.toDouble
}
}
implicit val intCast = new Casts[Int] {
override def cast(a: Any): Int = a match {
case x: Int => x
case x: Double => x.toInt
}
}
implicit val seqCharCast = new Casts[Seq[Char]] {
override def cast(a: Any): Seq[Char] = a match {
case x: Set[Char] => x.toSeq
case x: Seq[Char] => x
}
}
def cast[T](key: String)(implicit p:Casts[T]) = p.cast(map(key))
println(cast[Double]("foo")) // <- 1.0
println(cast[Int]("foo")) // <- 1
println(cast[Seq[Char]]("bar")) // <- ArrayBuffer(a, b) which is Seq(a, b)
But you still need to iterate over all type-to-type options, which is reasonable as Set('a', 'b').asInstanceOf[Seq[Char]] throws, and you cannot use a universal cast, so you need to handle such cases differently.
Still it sounds like an overkill, and you may need to review your approach from global perspective
Given this spinet of code in Scala:
val mapMerge : (Map[VertexId, Factor], Map[VertexId, Factor]) => Map[VertexId, Factor] = (d1, d2) => d1 ++ d2
That can be shortened to:
val mapMerge : (Map[VertexId, Factor], Map[VertexId, Factor]) => Map[VertexId, Factor] = _ ++ _
What actually the code does is renaming the operator ++ of Map[VertexId, Factor] and therefore: Is there a way to assign that operator to the variable? Like in this imaginary example:
val mapMerge : (Map[VertexId, Factor], Map[VertexId, Factor]) => Map[VertexId, Factor] = Map.++
And probably with type inference it would enough to write
val mapMerge = Map[VertexId,Factor].++
Thanks
Unfortunately, no, because the "operators" in Scala are instance methods — not functions from a typeclass, like in Haskell.
Whey you write _ ++ _, you are creating a new 2-argument function(lambda) with unnamed parameters. This is equivalent to (a, b) => a ++ b, which is in turn equivalent to (a, b) => a.++(b), but not to (a, b) => SomeClass.++(a, b).
You can emulate typeclasses by using implicit arguments (see "typeclasses in scala" presentation)
You can pass "operators" like functions — which are not really operators. And you can have operators which look the same. See this example:
object Main {
trait Concat[A] { def ++ (x: A, y: A): A }
implicit object IntConcat extends Concat[Int] {
override def ++ (x: Int, y: Int): Int = (x.toString + y.toString).toInt
}
implicit class ConcatOperators[A: Concat](x: A) {
def ++ (y: A) = implicitly[Concat[A]].++(x, y)
}
def main(args: Array[String]): Unit = {
val a = 1234
val b = 765
val c = a ++ b // Instance method from ConcatOperators — can be used with infix notation like other built-in "operators"
println(c)
val d = highOrderTest(a, b)(IntConcat.++) // 2-argument method from the typeclass instance
println(d)
// both calls to println print "1234765"
}
def highOrderTest[A](x: A, y: A)(fun: (A, A) => A) = fun(x, y)
}
Here we define Concat typeclass and create an implementation for Int and we use operator-like name for the method in typeclass.
Because you can implement a typeclass for any type, you can use such trick with any type — but that would require writing quite some supporting code, and sometimes it is not worth the result.
Let's consider a specific example. I have lots of functions that take a variable number of arguments, and return a Seq[T]. Say:
def nonNeg(start: Int, count: Int): Seq[Int] =
Iterator.from(start).take(count).toSeq
For each one of those function, I need to create a "Java version" of that function, returns a java.util.List[T]. I can create the "Java version" of the above function with:
def javaNonNeg(start: Int, count: Int): java.util.List[Int] =
nonNeg(start, count).asJava
This is somewhat verbose, as the list of parameters is duplicated twice. Instead, I'd like to create a higher level function that takes as a parameter a function of the form of nonNeg (any number and type of arguments, returns a Seq[T]) and returns a function which takes the same arguments, but returns a java.util.List[T]. Assuming that function was called makeJava, I'd then be able to write:
def javaNonNeg = makeJava(nonNeg)
Can makeJava be written using Shapeless ability to abstracting over arity? If it can, how, and it not, why and how else can this be done?
It is possible to use Shapeless to avoid the boilerplate—you just need to turn the original method into a FunctionN using plain old eta expansion, then convert to a function taking a single HList argument, and then back to a FunctionN with the new result type:
import java.util.{ List => JList }
import shapeless._, ops.function._
import scala.collection.JavaConverters._
def makeJava[F, A, L, S, R](f: F)(implicit
ftp: FnToProduct.Aux[F, L => S],
ev: S <:< Seq[R],
ffp: FnFromProduct[L => JList[R]]
) = ffp(l => ev(ftp(f)(l)).asJava)
And then:
scala> def nonNeg(start: Int, count: Int): Seq[Int] =
| Iterator.from(start).take(count).toSeq
nonNeg: (start: Int, count: Int)Seq[Int]
scala> val javaNonNeg = makeJava(nonNeg _)
javaNonNeg: (Int, Int) => java.util.List[Int] = <function2>
scala> javaNonNeg(1, 4)
res0: java.util.List[Int] = [1, 2, 3, 4]
javaNonNeg is a Function2, so from Java you can use javaNonNeg.apply(1, 4).
For 2 and more (in code below to 4) parameters you can use implicit parameters feature, for resolve result type by input parameter type
sealed trait FuncRes[F] {
type Param
type Result
def func : F => Param => Result
}
class Func[T, R](fn : T => R) {
trait FR[F, P] extends FuncRes[F] { type Param = P; type Result = R }
implicit def func2[T1,T2] = new FR[(T1,T2) => T, (T1,T2)] {
def func = f => p => fn(f.tupled(p))
}
implicit def func3[T1,T2,T3] = new FR[(T1,T2,T3) => T, (T1,T2,T3)] {
def func = f => p => fn(f.tupled(p))
}
implicit def func4[T1,T2,T3,T4] = new FR[(T1,T2,T3,T4) => T, (T1,T2,T3,T4)] {
def func = f => p => fn(f.tupled(p))
}
def makeFunc[F](f : F)(implicit ev : FuncRes[F]): ev.Param => ev.Result =
ev.func(f)
}
and after your def javaNonNeg = makeJava(nonNeg) function will look like:
object asJavaFunc extends Func((_ : Seq[Int]).asJava)
import asJavaFunc._
def javaNonNeq = makeFunc(nonNeg _)
And of course it has some disadvantages, but generally it satisfy your needs.
Suppose I have a list of functions as so:
val funcList = List(func1: A => T, func2: B => T, func2: C => T)
(where func1, et al. are defined elsewhere)
I want to write a method that will take a value and match it to the right function based on exact type (match a: A with func1: A => T) or throw an exception if there is no matching function.
Is there a simple way to do this?
This is similar to what a PartialFunction does, but I am not able to change the list of functions in funcList to PartialFunctions. I am thinking I have to do some kind of implicit conversion of the functions to a special class that knows the types it can handle and is able to pattern match against it (basically promoting those functions to a specialized PartialFunction). However, I can't figure out how to identify the "domain" of each function.
Thank you.
You cannot identify the domain of each function, because they are erased at runtime. Look up erasure if you want more information, but the short of it is that the information you want does not exist.
There are ways around type erasure, and you'll find plenty discussions on Stack Overflow itself. Some of them come down to storing the type information somewhere as a value, so that you can match on that.
Another possible solution is to simply forsake the use of parameterized types (generics in Java parlance) for your own customized types. That is, doing something like:
abstract class F1 extends (A => T)
object F1 {
def apply(f: A => T): F1 = new F1 {
def apply(n: A): T = f(n)
}
}
And so on. Since F1 doesn't have type parameters, you can match on it, and you can create functions of this type easily. Say both A and T are Int, then you could do this, for example:
F1(_ * 2)
The usual answer to work around type erasure is to use the help of manifests. In your case, you can do the following:
abstract class TypedFunc[-A:Manifest,+R:Manifest] extends (A => R) {
val retType: Manifest[_] = manifest[R]
val argType: Manifest[_] = manifest[A]
}
object TypedFunc {
implicit def apply[A:Manifest, R:Manifest]( f: A => R ): TypedFunc[A, R] = {
f match {
case tf: TypedFunc[A, R] => tf
case _ => new TypedFunc[A, R] { final def apply( arg: A ): R = f( arg ) }
}
}
}
def applyFunc[A, R, T >: A : Manifest]( funcs: Traversable[TypedFunc[A,R]] )( arg: T ): R = {
funcs.find{ f => f.argType <:< manifest[T] } match {
case Some( f ) => f( arg.asInstanceOf[A] )
case _ => sys.error("Could not find function with argument matching type " + manifest[T])
}
}
val func1 = { s: String => s.length }
val func2 = { l: Long => l.toInt }
val func3 = { s: Symbol => s.name.length }
val funcList = List(func1: TypedFunc[String,Int], func2: TypedFunc[Long, Int], func3: TypedFunc[Symbol, Int])
Testing in the REPL:
scala> applyFunc( funcList )( 'hello )
res22: Int = 5
scala> applyFunc( funcList )( "azerty" )
res23: Int = 6
scala> applyFunc( funcList )( 123L )
res24: Int = 123
scala> applyFunc( funcList )( 123 )
java.lang.RuntimeException: Could not find function with argument matching type Int
at scala.sys.package$.error(package.scala:27)
at .applyFunc(<console>:27)
at .<init>(<console>:14)
...
I think you're misunderstanding how a List is typed. List takes a single type parameter, which is the type of all the elements of the list. When you write
val funcList = List(func1: A => T, func2: B => T, func2: C => T)
the compiler will infer a type like funcList : List[A with B with C => T].
This means that each function in funcList takes a parameter that is a member of all of A, B, and C.
Apart from this, you can't (directly) match on function types due to type erasure.
What you could instead do is match on a itself, and call the appropriate function for the type:
a match {
case x : A => func1(x)
case x : B => func2(x)
case x : C => func3(x)
case _ => throw new Exception
}
(Of course, A, B, and C must remain distinct after type-erasure.)
If you need it to be dynamic, you're basically using reflection. Unfortunately Scala's reflection facilities are in flux, with version 2.10 released a few weeks ago, so there's less documentation for the current way of doing it; see How do the new Scala TypeTags improve the (deprecated) Manifests?.