How can I make a function generic on an MLReader - scala

I am working in Spark 1.6.3. Here are two functions that do the same thing:
def modelFromBytesCV(modelArray: Array[Byte]): CountVectorizerModel = {
val tempPath: Path = KAZOO_TEMP_DIR.resolve(s"model_${System.currentTimeMillis()}")
Files.write(tempPath, modelArray)
CountVectorizerModel.read.load(tempPath.toString)
}
def modelFromBytesIDF(modelArray: Array[Byte]): IDFModel = {
val tempPath: Path = KAZOO_TEMP_DIR.resolve(s"model_${System.currentTimeMillis()}")
Files.write(tempPath, modelArray)
IDFModel.read.load(tempPath.toString)
}
I would like to make these functions generic. What I am hung up on is that the common trait between the CountVectorizerModel object and IDFModel is MLReadable[T] which itself must take as a type either CountVectorizerModel or IDFModel. This is sort of a recursive parent class loop that I can't figure out a solution to.
By comparison, the generic model writer is easy, because MLWritable is a common trait extended by all the models I am interested in:
def modelToBytes[M <: MLWritable](model: M): Array[Byte] = {
val tempPath: Path = KAZOO_TEMP_DIR.resolve(s"model_${System.currentTimeMillis()}")
model.write.overwrite().save(tempPath.toString)
Files.readAllBytes(tempPath)
}
How can I make a generic reader that will turn turn a spark-ml model into a byte array?

To make it work you'll need access to a specific MlReadable object.
import org.apache.spark.ml.util.MLReadable
def modelFromBytes[M](obj: MLReadable[M], modelArray: Array[Byte]): M = {
val tempPath: Path = ???
...
obj.read.load(tempPath.toString)
}
which could be later used as:
val bytes: Array[Byte] = ???
modelFromBytes(CountVectorizerModel, bytes)
Note that, despite the first appearance, there is nothing recursive here - MLReadable[M] refers to companion object, not class as such. So for example CountVectorizerModel object is MLReadable, while CountVectorizeModel class isn't.
Internally, Spark MLReader handles this in a different way - it creates an instance of the class using reflection, and then sets its Params. However this path won't be very useful for you here*.
If compatibility with the current API is required, you can try making readable object implicit:
def modelFromBytes[M](modelArray: Array[Byte])(implicit obj: MLReadable[M]): M = {
...
}
and then
implicit val readable: MLReadable[CountVectorizerModel] = CountVectorizerModel
modelFromBytes[CountVectorizerModel](bytes)
* Technically speaking it is possible to get companion object via reflection
def modelFromBytesCV[M <: MLWritable](
modelArray: Array[Byte])(implicit ct: ClassTag[M]): M = {
val tempPath: Path = ???
...
val cls = Class.forName(ct.runtimeClass.getName + "$");
cls.getField("MODULE$").get(cls).asInstanceOf[MLReadable[M]]
.read.load(tempPath.toString))
}
but I don't think that is a path worth exploring here. In particular we cannot really provide strict type bounds here - using MLWritable is a hack to limit human errors, but is rather useless for compiler.

Related

Is there anyway, in Scala, to get the Singleton type of something from the more general type?

I have a situation where I'm trying to use implicit resolution on a singleton type. This works perfectly fine if I know that singleton type at compile time:
object Main {
type SS = String with Singleton
trait Entry[S <: SS] {
type out
val value: out
}
implicit val e1 = new Entry["S"] {
type out = Int
val value = 3
}
implicit val e2 = new Entry["T"] {
type out = String
val value = "ABC"
}
def resolve[X <: SS](s: X)(implicit x: Entry[X]): x.value.type = {
x.value
}
def main(args: Array[String]): Unit = {
resolve("S") //implicit found! No problem
}
}
However, if I don't know this type at compile time, then I run into issues.
def main(args: Array[String]): Unit = {
val string = StdIn.readLine()
resolve(string) //Can't find implicit because it doesn't know the singleton type at runtime.
}
Is there anyway I can get around this? Maybe some method that takes a String and returns the singleton type of that string?
def getSingletonType[T <: SS](string: String): T = ???
Then maybe I could do
def main(args: Array[String]): Unit = {
val string = StdIn.readLine()
resolve(getSingletonType(string))
}
Or is this just not possible? Maybe you can only do this sort of thing if you know all of the information at compile-time?
If you knew about all possible implementations of Entry in compile time - which would be possible only if it was sealed - then you could use a macro to create a map/partial function String -> Entry[_].
Since this is open to extending, I'm afraid at best some runtime reflection would have to scan the whole classpath to find all possible implementations.
But even then you would have to embed this String literal somehow into each implementations because JVM bytecode knows nothing about mappings between singleton types and implementations - only Scala compiler does. And then use that to find if among all implementations there is one (and exactly one) that matches your value - in case of implicits if there are two of them at once in the same scope compilation would fail, but you can have more than one implementation as long as the don't appear together in the same scope. Runtime reflection would be global so it wouldn't be able to avoid conflicts.
So no, no good solution for making this compile-time dispatch dynamic. You could create such dispatch yourself by e.g. writing a Map[String, Entry[_]] yourself and using get function to handle missing pices.
Normally implicits are resolved at compile time. But val string = StdIn.readLine() becomes known at runtime only. Principally, you can postpone implicit resolution till runtime but you'll be able to apply the results of such resolution at runtime only, not at compile time (static types etc.)
object Entry {
implicit val e1 = ...
implicit val e2 = ...
}
import scala.reflect.runtime.universe._
import scala.reflect.runtime
import scala.tools.reflect.ToolBox
val toolbox = ToolBox(runtime.currentMirror).mkToolBox()
def resolve(s: String): Any = {
val typ = appliedType(
typeOf[Entry[_]].typeConstructor,
internal.constantType(Constant(s))
)
val instanceTree = toolbox.inferImplicitValue(typ, silent = false)
val instance = toolbox.eval(toolbox.untypecheck(instanceTree)).asInstanceOf[Entry[_]]
instance.value
}
resolve("S") // 3
val string = StdIn.readLine()
resolve(string)
// 3 if you enter S
// ABC if you enter T
// "scala.tools.reflect.ToolBoxError: implicit search has failed" otherwise
Please notice that I put implicits into the companion object of type class in order to make them available in the implicit scope and therefore in the toolbox scope. Otherwise the code should be modified slightly:
object EntryImplicits {
implicit val e1 = ...
implicit val e2 = ...
}
// val instanceTree = toolbox.inferImplicitValue(typ, silent = false)
// should be replaced with
val instanceTree =
q"""
import path.to.EntryImplicits._
implicitly[$typ]
"""
In your code import path.to.EntryImplicits._ is import Main._.
Load Dataset from Dynamically generated Case Class

How to save a Type or TypeTag to a val for later use?

I would like to save a Type or TypeTag in a val for later use. At this time, I am having to specify a type in several locations in a block of code. I do not need to parameterize the code because only one type will be used. This is more of a curiosity than a necessity.
I tried using typeOf, classOf, getClass, and several other forms of accessing the class and type. The solution is likely simple but my knowledge of Scala typing or type references is missing this concept.
object Example extends App {
import scala.reflect.runtime.universe._
object TestClass { val str = "..." }
case class TestClass() { val word = ",,," }
def printType[A: TypeTag](): Unit = println(typeOf[A])
printType[List[Int]]() //prints 'List[Int]'
printType[TestClass]() //prints 'Example.TestClass'
val typeOfCompanion: ??? = ??? //TODO what goes here?
val typeOfCaseClass: ??? = ??? //TODO what goes here?
printType[typeOfCompanion]() //TODO should print something like 'Example.TestClass'
printType[typeOfCaseClass]() //TODO should print something like 'Example.TestClass'
}
The solution should be able to save a Type or TypeTag or what the solution is. Then, pass typeOfCompanion or typeOfCaseClass like printTypetypeOfCompanion for printing. Changing the printing portion of the code may be required; I am not certain.
You have to be more explicit here
import scala.reflect.runtime.universe._
def printType(a: TypeTag[_]): Unit = println(a)
val typeOfCompanion = typeTag[List[Int]]
printType(typeOfCompanion)
def printType[A: TypeTag](): Unit = println(typeOf[A])
is exactly the same as
def printType[A]()(implicit a: TypeTag[A]): Unit = println(typeOf[A])
(except for the parameter name). So it can be called as
val listTypeTag /* : TypeTag[List[Int]] */ = typeTag[List[Int]]
printType()(listTypeTag)
(you can remove the empty parameter list from printType if you want).
For the companion, you need to use a singleton type:
val companionTag = typeTag[TestClass.type]
val caseClassTag = typeTag[TestClass]

Calling method via reflection in Scala

I want to call an arbitrary public method of an arbitrary stuff via reflection. I.e. let's say, I want to write method extractMethod to be used like:
class User { def setAvatar(avatar: Avatar): Unit = …; … }
val m = extractMethod(someUser, "setAvatar")
m(someAvatar)
From the Reflection. Overview document from Scala docs, I see the following direct way to do that:
import scala.reflect.ClassTag
import scala.reflect.runtime.universe._
def extractMethod[Stuff: ClassTag: TypeTag](
stuff: Stuff,
methodName: String): MethodMirror =
{
val stuffTypeTag = typeTag[Stuff]
val mirror = stuffTypeTag.mirror
val stuffType = stuffTypeTag.tpe
val methodSymbol = stuffType
.member(TermName(methodName)).asMethod
mirror.reflect(stuff)
.reflectMethod(methodSymbol)
}
However what I'm bothered with this solution is that I need to pass implicit ClassTag[Stuff] and TypeTag[Stuff] parameters (first one is needed for calling reflect, second one — for getting stuffType). Which may be quite cumbersome, especially if extractMethod is called from generics that are called from generics and so on. I'd accept this as necessity for some languages that strongly lack runtime type information, but Scala is based on JRE, which allows to do the following:
def extractMethod[Stuff](
stuff: Stuff,
methodName: String,
parameterTypes: Array[Class[_]]): (Object*) => Object =
{
val unboundMethod = stuff.getClass()
.getMethod(methodName, parameterTypes: _*)
arguments => unboundMethod(stuff, arguments: _*)
}
I understand that Scala reflection allows to get more information that basic Java reflection. Still, here I just need to call a method. Is there a way to somehow reduce requirements (e.g. these ClassTag, TypeTag) of the Scala-reflection-based extractMethod version (without falling back to pure-Java reflection), assuming that performance doesn't matter for me?
Yes, there is.
First, according to this answer, TypeTag[Stuff] is a strictly stronger requirement than ClassTag[Stuff]. Although we don't automatically get implicit ClassTag[Stuff] from implicit TypeTag[Stuff], we can evaluate it manually as ClassTag[Stuff](stuffTypeTag.mirror.runtimeClass(stuffTypeTag.tpe)) and then implicitly or explicitly pass it to reflect that needs it:
import scala.reflect.ClassTag
import scala.reflect.runtime.universe._
def extractMethod[Stuff: TypeTag](
stuff: Stuff,
methodName: String): MethodMirror =
{
val stuffTypeTag = typeTag[Stuff]
val mirror = stuffTypeTag.mirror
val stuffType = stuffTypeTag.tpe
val stuffClassTag = ClassTag[Stuff](mirror.runtimeClass(stuffType))
val methodSymbol = stuffType
.member(TermName(methodName)).asMethod
mirror.reflect(stuff)(stuffClassTag)
.reflectMethod(methodSymbol)
}
Second, mirror and stuffType can be obtained from stuff.getClass():
import scala.reflect.ClassTag
import scala.reflect.runtime.universe._
def extractMethod(stuff: Stuff, methodName: String): MethodMirror = {
val stuffClass = stuff.getClass()
val mirror = runtimeMirror(stuffClass.getClassLoader)
val stuffType = mirror.classSymbol(stuffClass).toType
val stuffClassTag = ClassTag[Stuff](mirror.runtimeClass(stuffType))
val methodSymbol = stuffType
.member(TermName(methodName)).asMethod
mirror.reflect(stuff)(stuffClassTag)
.reflectMethod(methodSymbol)
}
Therefore we obtained Scala-style reflection entities (i.e. finally MethodMirror) without requiring ClassTag and/or TypeTag to be passed explicitly or implicitly from the caller. Not sure, however, how it compares with the ways described in the question (i.e. passing tags from outside and pure Java) in the terms of performance.

Trouble using Implicit Ordered with PriorityQueue (Scala)

I'm trying to create a data structure that has a PriorityQueue in it. I've succeeded in making a non-generic version of it. I can tell it works because it solves the A.I. problem I have.
Here is a snippet of it:
class ProntoPriorityQueue { //TODO make generic
implicit def orderedNode(node: Node): Ordered[Node] = new Ordered[Node] {
def compare(other: Node) = node.compare(other)
}
val hashSet = new HashSet[Node]
val priorityQueue = new PriorityQueue[Node]()
...
I'm trying to make it generic, but if I use this version it stops solving the problem:
class PQ[T <% Ordered[T]] {
//[T]()(implicit val ord: T => Ordered[T]) {
//[T]()(implicit val ord: Ordering[T] {
val hashSet = new HashSet[T]
val priorityQueue = new PriorityQueue[T]
...
I've also tried what's commented out instead of using [T <% Ordered[T]]
Here is the code that calls PQ:
//the following def is commented out while using ProntoPriorityQueue
implicit def orderedNode(node: Node): Ordered[Node] = new Ordered[Node] {
def compare(other: Node) = node.compare(other)
} //I've also tried making this return an Ordering[Node]
val frontier = new PQ[Node] //new ProntoPriorityQueue
//have also tried (not together):
val frontier = new PQ[Node]()(orderedNode)
I've also tried moving the implicit def into the Node object (and importing it), but essentially the same problem.
What am I doing wrong in the generic version? Where should I put the implicit?
Solution
The problem was not with my implicit definition. The problem was the implicit ordering was being picked up by a Set that was automatically generating in a for(...) yield(...) statement. This caused a problem where the yielded set only contained one state.
What's wrong with simply defining an Ordering on your Node (Ordering[Node]) and using the already-generic Scala PriorityQueue?
As general rule, it's better to work with Ordering[T] than T <: Ordered[T] or T <% Ordered[T]. Conceptually, Ordered[T] is an intrinsic (inherited or implemented) property of the type itself. Notably, a type can have only one intrinsic ordering relationship defined this way. Ordering[T] is an external specification of the ordering relationship. There can any be any number of different Ordering[T].
Also, if you're not already aware, you should know that the difference between T <: U and T <% U is that while the former includes only nominal subtype relations (actual inheritance), the latter also includes the application of implicit conversions that yield a value conforming to the type bound.
So if you want to use Node <% Ordered[Node] and you don't have a compare method defined in the class, an implicit conversion will be applied every time a comparison needs to be made. Additionally, if your type has its own compare, the implicit conversion will never be applied and you'll be stuck with that "built-in" ordering.
Addendum
I'll give a few examples based on a class, call it CIString that simply encapsulates a String and implements ordering as case-invariant.
/* Here's how it would be with direct implementation of `Ordered` */
class CIString1(val s: String)
extends Ordered[CIString1]
{
private val lowerS = s.toLowerCase
def compare(other: CIString1) = lowerS.compareTo(other.lowerS)
}
/* An uninteresting, empty ordered set of CIString1
(fails without the `extends` clause) */
val os1 = TreeSet[CIString1]()
/* Here's how it would look with ordering external to `CIString2`
using an implicit conversion to `Ordered` */
class CIString2(val s: String) {
val lowerS = s.toLowerCase
}
class CIString2O(ciS: CIString2)
extends Ordered[CIString2]
{
def compare(other: CIString2) = ciS.lowerS.compareTo(other.lowerS)
}
implicit def cis2ciso(ciS: CIString2) = new CIString2O(ciS)
/* An uninteresting, empty ordered set of CIString2
(fails without the implicit conversion) */
val os2 = TreeSet[CIString2]()
/* Here's how it would look with ordering external to `CIString3`
using an `Ordering` */
class CIString3(val s: String) {
val lowerS = s.toLowerCase
}
/* The implicit object could be replaced by
a class and an implicit val of that class */
implicit
object CIString3Ordering
extends Ordering[CIString3]
{
def compare(a: CIString3, b: CIString3): Int = a.lowerS.compareTo(b.lowerS)
}
/* An uninteresting, empty ordered set of CIString3
(fails without the implicit object) */
val os3 = TreeSet[CIString3]()
Well, one possible problem is that your Ordered[Node] is not a Node:
implicit def orderedNode(node: Node): Ordered[Node] = new Ordered[Node] {
def compare(other: Node) = node.compare(other)
}
I'd try with an Ordering[Node] instead, which you say you tried but there isn't much more information about. PQ would be declared as PQ[T : Ordering].

Implicit conversion, import required or not?

I write
object MyString {
implicit def stringToMyString(s: String) = new MyString(s)
}
class MyString(str: String) {
def camelize = str.split("_").map(_.capitalize).mkString
override def toString = str
}
object Parse {
def main(args: Array[String]) {
val x = "active_record".camelize
// ...
}
}
in my program. This causes a compiling error. After I inserted
import MyString.stringToMyString
Then it works.
From Odersky's Programming in Scala I got that implicit conversion in the companion object of the source or expected target types don't need to be imported.
implicit conversion in the companion
object of the source or expected
target types don't need to be
imported.
True enough. Now, the method camelize is defined on the class MyString, and, indeed, there is an implicit conversion to MyString inside its object companion. However, there is nothing in the code telling the compiler that MyString is the expected target type.
If, instead, you wrote this:
val x = ("active_record": MyString).camelize
then it would work, because the compiler would know you expect "active_record" to be a MyString, making it look up the implicit conversion inside object MyString.
This might look a bit restrictive, but it actually works in a number of places. Say, for instance, you had:
class Fraction(num: Int, denom: Int) {
...
def +(b: Fraction) = ...
...
}
And then you had a code like this:
val x: Fraction = ...
val y = x + 5
Now, x does have a + method, whose expected type is Fraction. So the compiler would look, here, for an implicit conversion from Int to Fraction inside the object Fraction (and inside the object Int, if there was one, since that's the source type).
In this situation you need the import because the compiler doesn't know where you pulled out the camelize method from. If the type is clear, it will compile without import:
object Parse {
def foo(s: MyString) = s.camelize
def main(args: Array[String]) {
val x = foo("active_record")
println(x.toString)
}
}
See Pimp my library pattern, based on Martin's article:
Note that it is not possible to put defs at the top level, so you can’t define an implicit conversion with global scope. The solution is to place the def inside an object, and then import it, i.e.
object Implicits {
implicit def listExtensions[A](xs : List[A]) = new ListExtensions(xs)
}
And then at the top of each source file, along with your other imports:
import Implicits._
I tried the Rational class example in Programming in Scala book, put an implicit method in its companion object:
object Rational {
implicit def intToRational(num: Int) =
new Rational(num)
}
but the code
2 + new Rational(1, 2)
does not work. For the conversion to happen, the single identifier rule applies, i.e., you need to import the explicit method into scope even though it is defined in the companion object.