Spark TaskNotSerializable when using anonymous function

Spark TaskNotSerializable when using anonymous function - scala

Background
Here's my situation: I'm trying to create a class that filters an RDD based on some feature of the contents, but that feature can be different in different scenarios so I'd like to parameterize that with a function. Unfortunately, I seem to be running into issues with the way Scala captures its closures. Even though my function is serializable, the class is not.
From the example in the spark source on closure cleaning, it seems to suggest my situation can't be solved, but I'm convinced there's a way to achieve what I'm trying to do by creating the right (smaller) closure.
My Code
class MyFilter(getFeature: Element => String, other: NonSerializable) {
def filter(rdd: RDD[Element]): RDD[Element] = {
// All my complicated logic I want to share
rdd.filter { elem => getFeature(elem) == "myTargetString" }
}
Simplified Example
class Foo(f: Int => Double, rdd: RDD[Int]) {
def go(data: RDD[Int]) = data.map(f)
}
val works = new Foo(_.toDouble, otherRdd)
works.go(myRdd).collect() // works
val myMap = Map(1 -> 10d)
val complicatedButSerializableFunc: Int => Double = x => myMap.getOrElse(x, 0)
val doesntWork = new Foo(complicatedButSerializableFunc, otherRdd)
doesntWork.go(myRdd).collect() // craps out
org.apache.spark.SparkException: Task not serializable
Caused by: java.io.NotSerializableException: $iwC$$iwC$Foo
Serialization stack:
- object not serializable (class: $iwC$$iwC$Foo, value: $iwC$$iwC$Foo#61e33118)
- field (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC, name: foo, type: class $iwC$$iwC$Foo)
- object (class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC, $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC#47d6a31a)
- field (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1, name: $outer, type: class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC)
- object (class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1, <function1>)
// Even though
val out = new ObjectOutputStream(new FileOutputStream("test.obj"))
out.writeObject(complicatedButSerializableFunc) // works
Questions
Why does the first simplied example not attempt to serialize all of Foo but the second one does?
How can I get the reference to my serializable function without including a reference to Foo in my closure?

Found the answer with the help of this article.
Essentially, when creating the closure for a given function, Scala will include the entire object for any complex field referenced (if someone has a good explanation for why this doesn't happen in the first simple example, I'll accept that answer). The solution is to pass the serializable value to a different function so that only the minimal reference is kept, very similar to the ol' javascript for-loop paradigm for event listeners.
Example
def enclose[E, R](enclosed: E)(func: E => R): R = func(enclosed)
class Foo(f: Int => Double, somethingNonserializable: RDD[String]) {
def go(data: RDD[Int]) = enclose(f) { actualFunction => data.map(actualFunction) }
}
Or with JS-style self-executing anonymous function
def go(data: RDD[Int]) = ((actualFunction: Int => Double) => data.map(actualFunction))(f)

Related

Serializing a case class with a lazy val causes a StackOverflow

Say I define the following case class:
case class C(i: Int) {
lazy val incremented = copy(i = i + 1)
}
And then try to serialize it to json:
val mapper = new ObjectMapper()
mapper.registerModule(DefaultScalaModule)
val out = new StringWriter
mapper.writeValue(out, C(4))
val json = out.toString()
println("Json is: " + json)
It will throw the following exception:
Exception in thread "main" com.fasterxml.jackson.databind.JsonMappingException: Infinite recursion (StackOverflowError) (through reference chain: C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]->C["incremented"]-
...
I don't know why is it trying to serialize the lazy val by default in the first place? This does not seem to me as the logical approach
And can I disable this feature?

This happens because Jackson is designed for Java. Specifically, note that:
Java has no idea of a lazy val
Java's normal semantics around fields and constructors don't allow the partitioning of fields into "needed for construction" and "derived for construction" (neither of those is a technical term) that Scala's combination of val in default constructor (implicitly present in a case class) and val in a class's body provide
The consequence of the second is that (except for beans, sometimes), Java-oriented serialization approaches tend to assume that anything which is a field (including private fields, since Java idiom is to make fields private by default) in the object needs to be serialized, with the ability to opt out through #transient annotations.
The first, in turn, means that lazy vals are implemented by the compiler in a way that includes a private field.
Thus to a Java-oriented serializer like Jackson, a lazy val without a #transient annotation gets serialized.
Scala-oriented serialization approaches (e.g. circe, play-json, etc.) tend to serialize case classes by only serializing the constructor parameters.

The solution I found was to use json4s for my serialization rather than jackson databind. My issue arose using akka cluster so I had to add a custom serlializer to my project. For reference here is my complete implementation:
class Json4sSerializer(system: ExtendedActorSystem) extends Serializer {
private val actorRefResolver = ActorRefResolver(system.toTyped)
object ActorRefSerializer extends CustomSerializer[ActorRef[_]](format => (
{
case JString(str) =>
actorRefResolver.resolveActorRef[AnyRef](str)
},
{
case actorRef: ActorRef[_] =>
JString(actorRefResolver.toSerializationFormat(actorRef))
}
))
implicit private val formats = DefaultFormats + ActorRefSerializer
def includeManifest: Boolean = true
def identifier = 1234567
def toBinary(obj: AnyRef): Array[Byte] = {
write(obj).getBytes(StandardCharsets.UTF_8)
}
def fromBinary(bytes: Array[Byte], clazz: Option[Class[_]]): AnyRef = clazz match {
case Some(cls) =>
read[AnyRef](new String(bytes, StandardCharsets.UTF_8))(formats, ManifestFactory.classType(cls))
case None =>
throw new RuntimeException("Specified includeManifest but it was never passed")
}
}

You can't serialize that class because the value is infinitely recursive (hence the stack overflow). Specifically, the value of incremented for C(4) is an instance of C(5). The value of incremented for C(5) is C(6). The value of incremented for C(6) is C(7) and so on...
Since an instance of C(n) contains an instance of C(n+1) it can never be fully serlialized.
If you don't want a field to appear in the JSON, make it a function:
case class C(i: Int) {
def incremented = copy(i = i + 1)
}
The root of this problem is trying to serialise a class that also implements application logic, which breaches the principle of Separation of Concerns (The S in SOLID).
It is better to have distinct classes for serialisation and populate them from the application data as necessary. This allows different forms of serialisation to be used without having to change the application logic.

Scala incompatible nested types created in implicit class

The code fragment provided is a made-up minimalistic example just to demonstrate the issue, not related to actual business logic types.
In the code below we have a nested Entry type inside Registry type.
class Registry[T](name: String) {
case class Entry(id: Long, value: T)
}
That makes sense cause Entries of different Registries are kind of different, incomparable types.
Then we may have an implicit Ops class, for example, used in tests, which binds our registries to some test storage implementation, a simple mutable map
object testOps {
import scala.collection.mutable
type TestStorage[T] = mutable.Map[Long, T]
implicit class RegistryOps[T](val self: Registry[T])(
implicit storage: TestStorage[T]
) {
def getById(id: Long): Option[self.Entry] =
storage.get(id).map(self.Entry(id, _))
def remove(entry: self.Entry): Unit = storage - entry.id
}
}
The problem is: the Entry consructed inside Ops wrapper is treated as an incomparable type to the original Registry object
object problem {
case class Item(name: String)
val items = new Registry[Item]("elems")
import testOps._
implicit val storage: TestStorage[Item] =
scala.collection.mutable.Map[Long, Item](
1L -> Item("whatever")
)
/** Compilation error:
found : _1.self.Entry where val _1: testOps.RegistryOps[problem.Item]
required: eta$0$1.self.Entry
*/
items.getById(1).foreach(items.remove)
}
The question is: Is there a way to declare Ops signatures to make compiler understand that we're working with same inner type? (
I've also tried self.type#Entry in RegistryOps with no luck)
If I miss some understanding and they are actually different types, I would appreciate any explanations and examples why considering them as same may break type system. Thanks!

To start off, it's worth noting that the implicitness here isn't really the issue—if you wrote out something like the following, it would fail in exactly the same way:
new RegistryOps(items).getById(1).foreach(e => new RegistryOps(items).remove(e))
There are ways to do the kind of thing you want to do, but they aren't really pleasant. One would be to desugar the implicit class so that you can have it capture a more specific type for the registry value:
class Registry[T](name: String) {
case class Entry(id: Long, value: T)
}
object testOps {
import scala.collection.mutable
type TestStorage[T] = mutable.Map[Long, T]
class RegistryOps[T, R <: Registry[T]](val self: R)(
implicit storage: TestStorage[T]
) {
def getById(id: Long): Option[R#Entry] =
storage.get(id).map(self.Entry(id, _))
def remove(entry: R#Entry): Unit = storage - entry.id
}
implicit def toRegistryOps[T](s: Registry[T])(
implicit storage: TestStorage[T]
): RegistryOps[T, s.type] = new RegistryOps[T, s.type](s)
}
This works just fine, either in the form you're using it, or slightly more explicitly:
scala> import testOps._
import testOps._
scala> case class Item(name: String)
defined class Item
scala> val items = new Registry[Item]("elems")
items: Registry[Item] = Registry#69c1ea07
scala> implicit val storage: TestStorage[Item] =
| scala.collection.mutable.Map[Long, Item](
| 1L -> Item("whatever")
| )
storage: testOps.TestStorage[Item] = Map(1 -> Item(whatever))
scala> val resultFor1 = items.getById(1)
resultFor1: Option[items.Entry] = Some(Entry(1,Item(whatever)))
scala> resultFor1.foreach(items.remove)
Note that the inferred static type of resultFor1 is exactly what you'd expect and want. Unlike the Registry[T]#Entry solution proposed in a comment above, this approach will prohibit you from taking an entry from one registry and removing it from another with the same T. Presumably you made Entry an inner case class specifically because you wanted to avoid that kind of thing. If you don't care you really should just promote Entry to its own top-level case class with its own T.
As a side note, you might think that it would work just to write the following:
implicit class RegistryOps[T, R <: Registry[T]](val self: R)(
implicit storage: TestStorage[T]
) {
def getById(id: Long): Option[R#Entry] =
storage.get(id).map(self.Entry(id, _))
def remove(entry: R#Entry): Unit = storage - entry.id
}
But you'd be wrong, because the synthetic implicit conversion method the compiler will produce when it desugars the implicit class will use a wider R than you need, and you'll be back in the same situation you had without the R. So you have to write your own toRegistryOps and specify s.type.
(As a footnote, I have to say that having some mutable state that you're passing around implicitly sounds like an absolute nightmare, and I'd strongly recommend not doing anything remotely like what you're doing here.)

Posting a self-answer hoping it can help someone:
We may move Entry type out of Registry into RegistryEntry with extra type-parameter, and bind a Registry self-type in type alias there, like:
case class RegistryEntry[T, R <: Registry[T]](id: Long, value: T)
case class Registry[T](name: String) {
type Entry = RegistryEntry[T, this.type]
def Entry(id: Long, value: T): Entry = RegistryEntry(id, value)
}
This will guarantee type-safety requested in the original question and "problem" code snippet would also compile.

How to apply user-defined function to column (gives "Task not serializable" when adding a column)?

I have to append this column generated by the method 'strToInt' which is turning out to be not serializable.
def strToInt(colVal : String) : Int = {
var str = new Array[String](3)
str(0) = "icmp"; str(1) = "tcp"; str(2) = "udp"
var i = 0
for (i <- 0 to str.length-1) {
if (str(i) == colVal) { return i }
}
throw new IllegalStateException("This never happens")
}
val strtoint = udf(strToInt(_:String)).apply(col("Atr 1"))
val newDF = df.withColumn("newCol", strtoint)
I have tried putting the function in a helper class this way,
object Helper extends Serializable {
def strToInt ...
}
but it doesn't help.

Change your code to be as follows where the function execution is at withColumn level (not when the UDF is defined).
// define a UDF
val strtoint = udf(strToInt _)
// use it (aka execute)
val newDF = df.withColumn("newCol", strtoint(col("Atr 1")))
That seemingly little change changes what you create and how you execute it afterwards.
As you may have noticed already, udf creates a user-defined function that Spark SQL understands (can can execute):
udf[RT, A1](f: (A1) ⇒ RT): UserDefinedFunction Defines a user-defined function of 1 arguments as user-defined function (UDF).
(I removed the implicit parameters to ease comprehension)
Quoting the scaladoc of UserDefinedFunction:
A user-defined function. To create one, use the udf functions in functions.
Not much I agree, but the "protocol" is to register a UDF first before you can execute it in your queries, say withColumn or select operators.
I'd also change strToInt to be more Scala-idiomatic (and hopefully easier to comprehend, too).
def strToInt(colVal : String) : Int = {
val strs = Array("icmp", "tcp", "udp")
strs.indexOf(colVal)
}

The key to understanding what's going on here is that while Scala is a functional programming language, it runs on the JVM which does not have support for a functional type. At runtime, any val assigned an "anonymous" or "lambda" function will actually be an instance of an anonymous class with an apply method. So let's say you have the following:
object helper {
val isNegative: (Int => Boolean) = (n: Int) => n < 0
}
This compiles to the same thing as this:
object helper {
val isNegative: Function1[Int, Boolean] = {
def apply(n: Int): Boolean = n < 0
}
}
isNegative is really an anonymous class instance extending the trait Function1. When you instead do this:
object helper {
def isNegative(n: Int): Boolean = n < 0
}
Now isNegative is a method of the object helper instead. When it comes to dealing with Spark, if you were to do something like this:
// ds is a Dataset[Int]
ds.filter(isNegative)
In the first case Spark will have to serialize the anonymous class assigned to isNegative and fail because it is not serializable. In the second case, it will have to serialize helper which does work because an object is serializable if all it's state is serializable.
To apply this to your problem, when you do this:
val strtoint = udf(strToInt(_:String)).apply(col("Atr 1"))
at runtime what strtoint is is an anonymous class instance with the trait Funtion1[String, UserDefinedFunction], that is a method that generates a UserDefinedFunction when it is a called. With the underscore filled in, it is identical to this:
val strtoInt: Function1[String, UserDefinedFunction] = new Function1[String, UserDefinedFunction] = {
def apply(t1: String) = udf(strToInt(t1 :String)).apply(col("Atr 1"))
}
to minimally change you code, you can just change the val to a def:
def sti = udf(strToInt(_:String)).apply(col("Atr 1"))
Now sti is a member function of it's enclosing class, and if that is serializable, you should be good as far as Spark is concerned. The other thing to keep in mind here is that strToInt also needs to be part of a serializable class or object
The other way to fix this as has been suggested would be to change val strtoint to a UserDefinedFunction which is a case class and thus serializable, however you still need to make sure that strToInt is a member of a serializable class or object.

This problem seems to be similar to the problem I was experiencing (In Java).
My udf function was using Cipher library to encrypt something and the exception that was thrown is :
Caused by: java.io.NotSerializableException: javax.crypto.Cipher
Serialization stack:
- object not serializable (class: javax.crypto.Cipher, value: javax.crypto.Cipher#625d02ce)
I could not add 'implements Serializable' to Cipher class because it was a library provided by Java.
I used the following solution from this link : spark-how-to-call-udf-over-dataset-in-java
private static UDF1 toUpper = new UDF1<String, String>() {
public String call(final String str) throws Exception {
return str.toUpperCase();
}
};
Register the UDF and you can use callUDF function.
import static org.apache.spark.sql.functions.callUDF;
import static org.apache.spark.sql.functions.col;
sqlContext.udf().register("toUpper", toUpper, DataTypes.StringType);
peopleDF.select(col("name"),callUDF("toUpper", col("name"))).show();
Where instead of calling str.toUpperCase(); I called my Cipher instance.

Scala annotation macro only works with pre-defined classes

Note: There's an EDIT below!
Note: There's another EDIT below!
I have written a Scala annotation macro that is being passed a class and creates (or rather populates) a case object. The name of the case object is the same as the name of the passed class. More importantly, for every field of the passed class, there will be a field in the case object of the same name. The fields of the case object, however, are all of type String, and their value is the name of the type of the respective field in the passed class. Example:
// Using the annotation macro to populate a case object called `String`
#RegisterClass(classOf[String]) case object String
// The class `String` defines a field called `value` of type `char[]`.
// The case object also has a field `value`, containing `"char[]"`.
println(String.value) // Prints `"char[]"` to the console
This, however, seems to only work with pre-defined classes such as String. If I define a case class A(...) and try to do #RegisterClass(classOf[A]) case object A, I get the following error:
[info] scala.tools.reflect.ToolBoxError: reflective compilation has failed:
[info]
[info] not found: type A
What have I done wrong? The code of my macro can be found below. Also, if someone notices un-idiomatic Scala or bad practices in general, I wouldn't mind a hint. Thank you very much in advance!
class RegisterClass[T](clazz: Class[T]) extends StaticAnnotation {
def macroTransform(annottees: Any*) =
macro RegisterClass.expandImpl[T]
}
object RegisterClass {
def expandImpl[T](c: blackbox.Context)(annottees: c.Expr[Any]*) = {
import c.universe._
val clazz: Class[T] = c.prefix.tree match {
case q"new RegisterClass($clazz)" => c.eval[Class[T]](c.Expr(clazz))
case _ => c.abort(c.enclosingPosition, "RegisterClass: Annotation expects a Class[T] instance as argument.")
}
annottees.map(_.tree) match {
case List(q"case object $caseObjectName") =>
if (caseObjectName.toString != clazz.getSimpleName)
c.abort(c.enclosingPosition, "RegisterClass: Annotated case object and class T of passed Class[T] instance" +
"must have the same name.")
val clazzFields = clazz.getDeclaredFields.map(field => field.getName -> field.getType.getSimpleName).toList
val caseObjectFields = clazzFields.map(field => {
val fieldName: TermName = field._1
val fieldType: String = field._2
q"val $fieldName = $fieldType"
})
c.Expr[Any](q"case object $caseObjectName { ..$caseObjectFields }")
case _ => c.abort(c.enclosingPosition, "RegisterClass: Annotation must be applied to a case object definition.")
}
}
}
EDIT: As Eugene Burmako pointed out, the error happens because class A hasn't been compiled yet, so a java.lang.Class for it doesn't exist. I have now started a bounty of 100 StackOverflow points for everyone who as an idea how one could get this to work!
EDIT 2: Some background on the use case: As part of my bachelor thesis I am working on a Scala DSL for expressing queries for event processing systems. Those queries are traditionally expressed as strings, which induces a lot of problems. A typical query would look like that: "select A.id, B.timestamp from pattern[A -> B]". Meaning: If an event of type A occurs and after that an event of type B occurs, too, give me the id of the A event and the timestamp of the B event. The types A and B usually are simple Java classes over which I have no control. id and timestamp are fields of those classes. I would like queries of my DSL to look like that: select (A.id, B.timestamp) { /* ... * / }. This means that for every class representing an event type, e.g., A, I need a companion object -- ideally of the same name. This companion object should have the same fields as the respective class, so that I can pass its fields to the select function, like so: select (A.id, B.timestamp) { /* ... * / }. This way, if I tried to pass A.idd to the select function, it would fail at compile-time if there was no such field in the original class -- because then there would not be one in the companion object either.

This isn't an answer to your macro problem, but it could be a solution to your general problem.
If you can allow a minor change to the syntax of your DSL this might be possible without using macro's (depending on other requirements not mentioned in this question).
scala> class Select[A,B]{
| def apply[R,S](fa: A => R, fb: B => S)(body: => Unit) = ???
| }
defined class Select
scala> def select[A,B] = new Select[A,B]
select: [A, B]=> Select[A,B]
scala> class MyA { def id = 42L }
defined class MyA
scala> class MyB { def timestamp = "foo" }
defined class MyB
scala> select[A,B](_.id, _.timestamp){ /* ... */ }
scala.NotImplementedError: an implementation is missing
I use the class Select here as a means to be able to specify the types of your event classes while letting the compiler infer the result types of the functions fa and fb. If your don't need those result types you could just write it as def select[A,B](fa: A => Any, fb: B => Any)(body: => Unit) = ???.
If necessary you can still implement the select or apply method as a macro. But using this syntax, you will no longer need to generate objects with macro annotations.

How do I create a class hierarchy of typed factory method constructors and access them from Scala using abstract types?

(Essentially I need some kind of a synthesis of these two questions (1, 2), but I'm not smart enough to combine them myself.)
I have a set of JAXB representations in Scala like this:
abstract class Representation {
def marshalToXml(): String = {
val context = JAXBContext.newInstance(this.getClass())
val writer = new StringWriter
context.createMarshaller.marshal(this, writer)
writer.toString()
}
}
class Order extends Representation {
#BeanProperty
var name: String = _
...
}
class Invoice extends Representation { ... }
The problem I have is with my unmarshalling "constructor" methods:
def unmarshalFromJson(marshalledData: String): {{My Representation Subclass}} = {
val mapper = new ObjectMapper()
mapper.getDeserializationConfig().withAnnotationIntrospector(new JaxbAnnotationIntrospector())
mapper.readValue(marshalledData, this.getClass())
}
def unmarshalFromXml(marshalledData: String): {{My Representation Subclass}} = {
val context = JAXBContext.newInstance(this.getClass())
val representation = context.createUnmarshaller().unmarshal(
new StringReader(marshalledData)
).asInstanceOf[{{Type of My Representation Subclass}}]
representation // Return the representation
}
Specifically, I can't figure out how to attach these unmarshalling methods in a typesafe and DRY way to each of my classes, and then to call them from Scala (and hopefully sometimes by using only abstract type information). In other words, I would like to do this:
val newOrder = Order.unmarshalFromJson(someJson)
And more ambitiously:
class Resource[R <: Representation] {
getRepresentation(marshalledData: String): R =
{{R's Singleton}}.unmarshalFromXml(marshalledData)
}
In terms of my particular stumbling blocks:
I can't figure out whether I should define my unmarshalFrom*() constructors once in the Representation class, or in a singleton Representation object - if the latter, I don't see how I can automatically inherit that down through the class hierarchy of Order, Invoice etc.
I can't get this.type (as per this answer) to work as a way of self-typing unmarshalFromJson() - I get a compile error type mismatch; found: ?0 where type ?0 required: Representation.this.type on the readValue() call
I can't figure out how to use the implicit Default[A] pattern (as per this answer) to work down my Representation class hierarchy to call the singleton unmarshalling constructors using type information only
I know this is a bit of a mammoth question touching on various different (but related) issues - any help gratefully received!
Alex

The key is to not try and attach the method to the class but rather pass it in as a parameter. To indicate the type you are expecting and let the type system handle passing it in. I tried to make the unmarshal invocation something that reads a little DSL like.
val order = UnMarshalXml( xml ).toRepresentation[Order]
The following is a fully testable code snippet
abstract class Representation {
def marshalToXml(): String = {
val context = JAXBContext.newInstance(this.getClass)
val writer = new StringWriter
context.createMarshaller.marshal(this, writer)
writer.toString
}
}
#XmlRootElement
class Order extends Representation {
#BeanProperty
var name: String = _
}
case class UnMarshalXml( xml: String ) {
def toRepresentation[T <: Representation](implicit m:Manifest[T]): T = {
JAXBContext.newInstance(m.erasure).createUnmarshaller().unmarshal(
new StringReader(xml)
).asInstanceOf[T]
}
}
object test {
def main( args: Array[String] ) {
val order = new Order
order.name = "my order"
val xml = order.marshalToXml()
println("marshalled: " + xml )
val received = UnMarshalXml( xml ).toRepresentation[Order]
println("received order named: " + received.getName )
}
}
You should see the following output if you run test.main
marshalled: <?xml version="1.0" encoding="UTF-8" standalone="yes"?><order><name>my order</name></order>
received name: my order

Here's the updated version of Neil's code which I used to support the second use case as well as the first:
case class UnmarshalXml(xml: String) {
def toRepresentation[T <: Representation](implicit m: Manifest[T]): T =
toRepresentation[T](m.erasure.asInstanceOf[Class[T]])
def toRepresentation[T <: Representation](typeT: Class[T]): T =
JAXBContext.newInstance(typeT).createUnmarshaller().unmarshal(
new StringReader(xml)
).asInstanceOf[T]
}
This supports simple examples like so:
val order = UnmarshalXml(xml).toRepresentation[Order]
But also for abstract type based usage, you can use like this:
val order = UnmarshalXml(xml).toRepresentation[T](typeOfT)
(Where you have grabbed and stored typeOfT using another implicit Manifest at the point of declaring T.)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Spark TaskNotSerializable when using anonymous function - scala

Related

Serializing a case class with a lazy val causes a StackOverflow

Scala incompatible nested types created in implicit class

How to apply user-defined function to column (gives "Task not serializable" when adding a column)?

Scala annotation macro only works with pre-defined classes

How do I create a class hierarchy of typed factory method constructors and access them from Scala using abstract types?

Categories

Resources