snakeyaml and spark results in an inability to construct objects - scala

The following code executes fine in a scala shell given snakeyaml version 1.17
import org.yaml.snakeyaml.Yaml
import org.yaml.snakeyaml.constructor.Constructor
import scala.collection.mutable.ListBuffer
import scala.beans.BeanProperty
class EmailAccount {
#scala.beans.BeanProperty var accountName: String = null
override def toString: String = {
return s"acct ($accountName)"
}
}
val text = """accountName: Ymail Account"""
val yaml = new Yaml(new Constructor(classOf[EmailAccount]))
val e = yaml.load(text).asInstanceOf[EmailAccount]
println(e)
However when running in spark (2.0.0 in this case) the resulting error is:
org.yaml.snakeyaml.constructor.ConstructorException: Can't construct a java object for tag:yaml.org,2002:EmailAccount; exception=java.lang.NoSuchMethodException: EmailAccount.<init>()
in 'string', line 1, column 1:
accountName: Ymail Account
^
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:350)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructObject(BaseConstructor.java:182)
at org.yaml.snakeyaml.constructor.BaseConstructor.constructDocument(BaseConstructor.java:141)
at org.yaml.snakeyaml.constructor.BaseConstructor.getSingleData(BaseConstructor.java:127)
at org.yaml.snakeyaml.Yaml.loadFromReader(Yaml.java:450)
at org.yaml.snakeyaml.Yaml.load(Yaml.java:369)
... 48 elided
Caused by: org.yaml.snakeyaml.error.YAMLException: java.lang.NoSuchMethodException: EmailAccount.<init>()
at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.createEmptyJavaBean(Constructor.java:220)
at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.construct(Constructor.java:190)
at org.yaml.snakeyaml.constructor.Constructor$ConstructYamlObject.construct(Constructor.java:346)
... 53 more
Caused by: java.lang.NoSuchMethodException: EmailAccount.<init>()
at java.lang.Class.getConstructor0(Class.java:2810)
at java.lang.Class.getDeclaredConstructor(Class.java:2053)
at org.yaml.snakeyaml.constructor.Constructor$ConstructMapping.createEmptyJavaBean(Constructor.java:216)
... 55 more
I launched the scala shell with
scala -classpath "/home/placey/snakeyaml-1.17.jar"
I launched the spark shell with
/home/placey/Downloads/spark-2.0.0-bin-hadoop2.7/bin/spark-shell --master local --jars /home/placey/snakeyaml-1.17.jar

Solution
Create a self-contained application and run it using spark-submit instead of using spark-shell.
I've created a minimal project for you as a gist here. All you need to do is put both files (build.sbt and Main.scala) in some directory, then run:
sbt package
in order to create a JAR. The JAR will be in target/scala-2.11/sparksnakeyamltest_2.11-1.0.jar or a similar location. You can get SBT from here if you haven't used it yet. Finally, you can run the project:
/home/placey/Downloads/spark-2.0.0-bin-hadoop2.7/bin/spark-submit --class "Main" --master local --jars /home/placey/snakeyaml-1.17.jar target/scala-2.11/sparksnakeyamltest_2.11-1.0.jar
The output should be:
[many lines of Spark's log)]
acct (Ymail Account)
[more lines of Spark's log)]
Explanation
Spark's shell (REPL) transforms all classes you define in it by adding $iw parameter to your constructors. I've explained it here. SnakeYAML expects a zero-parameter constructor for JavaBean-like classes, but there isn't one, so it fails.
You can try this yourself:
scala> class Foo() {}
defined class Foo
scala> classOf[Foo].getConstructors()
res0: Array[java.lang.reflect.Constructor[_]] = Array(public Foo($iw))
scala> classOf[Foo].getConstructors()(0).getParameterCount
res1: Int = 1
As you can see, Spark transforms the constructor by adding a parameter of type $iw.
Alternative solutions
Define your own Constructor
If you really need to get it working in the shell, you could define your own class implementing org.yaml.snakeyaml.constructor.BaseConstructor and make sure that $iw gets passed to constructors, but this is a lot of work (I actually wrote my own Constructor in Scala for security reasons some time ago, so I have some experience with this).
You could also define a custom Constructor hard-coded to instantiate a specific class (EmailAccount in your case) similar to the DiceConstructor shown in SnakeYAML's documentation. This is much easier, but requires writing code for each class you want to support.
Example:
case class EmailAccount(accountName: String)
class EmailAccountConstructor extends org.yaml.snakeyaml.constructor.Constructor {
val emailAccountTag = new org.yaml.snakeyaml.nodes.Tag("!emailAccount")
this.rootTag = emailAccountTag
this.yamlConstructors.put(emailAccountTag, new ConstructEmailAccount)
private class ConstructEmailAccount extends org.yaml.snakeyaml.constructor.AbstractConstruct {
def construct(node: org.yaml.snakeyaml.nodes.Node): Object = {
// TODO: This is fine for quick prototyping in a REPL, but in a real
// application you should probably add type checks.
val mnode = node.asInstanceOf[org.yaml.snakeyaml.nodes.MappingNode]
val mapping = constructMapping(mnode)
val name = mapping.get("accountName").asInstanceOf[String]
new EmailAccount(name)
}
}
}
You can save this as a file and load it in the REPL using :load filename.scala.
Bonus advantage of this solution is that it can create immutable case class instances directly. Unfortunately Scala REPL seems to have issues with imports, so I've used fully qualified names.
Don't use JavaBeans
You can also just parse YAML documents as simple Java maps:
scala> val yaml2 = new Yaml()
yaml2: org.yaml.snakeyaml.Yaml = Yaml:1141996301
scala> val e2 = yaml2.load(text)
e2: Object = {accountName=Ymail Account}
scala> val map = e2.asInstanceOf[java.util.Map[String, Any]]
map: java.util.Map[String,Any] = {accountName=Ymail Account}
scala> map.get("accountName")
res4: Any = Ymail Account
This way SnakeYAML won't need to use reflection.
However, since you're using Scala, I recommend trying
MoultingYAML, which is a Scala wrapper for SnakeYAML. It parses YAML documents to simple Java types and then maps them to Scala types (even your own types like EmailAccount).

Related

Write method to a class dynamically at runtime in scala and create a jar

I would like to understand is there a way to write a method to existing class at runtime and to create a jar dynamically in scala.
So far i tried to create a class dynamically and able to run it thru reflection, however the class is dynamic class which isnt generated.
val mirror = runtimeMirror(getClass.getClassLoader)
val tb = ToolBox(mirror).mkToolBox()
val function = q"def function(x: Int): Int = x + 2"
val functionWrapper = "object FunctionWrapper { " + function + "}"
data.map(x => tb.eval(q"$functionSymbol.function($x)"))
i got this from other source, however the class is available only for this run and will not be generated.
i would like to add a function to the existing class at runtime and able to compile it and create a jar for it.
Kindly suggest me the way
Thanks in advance.
I guess the code snippet you provided should actually look like
import scala.reflect.runtime.universe._
import scala.tools.reflect.ToolBox
val mirror = runtimeMirror(getClass.getClassLoader)
val tb = ToolBox(mirror).mkToolBox()
val function: Tree = q"def function(x: Int): Int = x + 2"
val functionWrapper: Symbol = tb.define(q"object FunctionWrapper { $function }".asInstanceOf[ImplDef])
val data: List[Tree] = List(q"1", q"2")
data.map(x => tb.eval(q"$functionWrapper.function($x)")) // List(3, 4)
... however the class is dynamic class which isnt generated.
... however the class is available only for this run and will not be generated.
How did you check that the class is not generated? (Which class, FunctionWrapper?)
is there a way to write a method to existing class at runtime and to create a jar dynamically in scala.
i would like to add a function to the existing class at runtime and able to compile it and create a jar for it.
What is "existing class"? Do you have access to its sources? Then you can modify the sources, compile them etc.
Does the class exist as a .class file? You can modify its byte code with Byte-buddy, ASM, Javassist, cglib etc., instrument the byte code with aspects etc.
Is it dynamic class (like FunctionWrapper above)? How did you create it? (For FunctionWrapper you have access to its Symbol so you can use it in further sources.)
Is the class already loaded? Then you'll have to play with class loaders (unload, modify, load modified).
Can a Java class add a method to itself at runtime?
In Java, given an object, is it possible to override one of the methods?

Scala can't import a class that Java can

Why I can do this in Java:
import javax.swing.GroupLayout.Group;
but if I do the same in Scala (by using Ammonite), I get this:
value Group is not a member of object javax.swing.GroupLayout possible
cause: maybe a semicolon is missing before `value Group'? import
javax.swing.GroupLayout.Group
Is it due to the fact that Group is a public class derived from a private class called Spring?.
I can import neither SequentialGroup nor ParallelGroup.
Is it a bug in Scala?
I'm using Java 11 and Scala 2.12.10.
Scala 2.13.1 also fails. :-(
I need the import, for defining a generic method that can have a Group parameter, that could be either a ParallelGroup or a SequentialGroup.
I'd like to generate a generic method that takes as a parameter a Group, that could be either a ParallelGroup or a SequientialGroup
That would be a type projection
def method(group: GroupLayout#Group) = ...
or if you also have the layout the group belongs to,
def method(layout: GroupLayout)(group: layout.Group) = ...
or
val layout: GroupLayout = ...
def method(group: layout.Group) = ...

Is it possible to test chisel Reg() in console?

To test Chisel code, I launch a console sbt then scala in the directory of my project where is the file build.sbt. I can import chisel3 library :
$ cd myproject
$ sbt
sbt:myproject> console
scala> import chisel3._
import chisel3._
Then I can test some chisel code for data type for example :
scala> val plop = "b01010101".U(20.W)
plop: chisel3.UInt = UInt<20>(85)
But I can test Reg() or other Module() elements :
scala> val plopReg = RegInit(23.U(24.W))
java.lang.IllegalArgumentException: requirement failed: must be inside Builder context
at scala.Predef$.require(Predef.scala:281)
at chisel3.internal.Builder$.dynamicContext(Builder.scala:232)
at chisel3.internal.Builder$.currentClock(Builder.scala:308)
at chisel3.internal.Builder$.forcedClock(Builder.scala:318)
at chisel3.RegInit$.apply(Reg.scala:155)
at chisel3.RegInit$.apply(Reg.scala:173)
... 36 elided
Is there a tips to test these chisel element in console ? Or is it mandatory to write a file code source ?
What's going on here is that UInt is a Chisel type while Reg is a hardware type.
You can play with hardware types only inside a module. I often do something like the following to play with them on the console:
import chisel3._
import chisel3.stage.{ChiselStage, ChiselGeneratorAnnotation}
import chisel3.util.Cat
import firrtl.EmittedCircuitAnnotation
class Foo extends MultiIOModule {
val in = IO(Input(Bool()))
val out = IO(Output(Bool()))
val tmp = RegNext(~in)
out := tmp
}
val args = Array(
"-X", "verilog",
"-E", "high",
"-E", "middle",
"-E", "low",
"-E", "verilog")
(new ChiselStage).execute(args, Seq(ChiselGeneratorAnnotation(() => new Foo)))
You can then look at the various outputs inside your chisel3 top-level directory.
More Information
What's going on, specifically, is that UInt (and things like it) are factory methods that generate classes (technically UInt is really an object that extends UIntFactory). When you do UInt(4.W), that's constructing a new UInt. You should be able to construct new classes anywhere you want which is why this works on the console.
However, when you do Reg(UInt(4.W)) that's interacting with global mutable state used during the elaboration process to associate a register with a specific module. This global mutable state is stored inside the Builder. The error that you get is coming from the Builder where you've tried to use its methods without first being inside a module.

Use implicit value from one module in another in Scala/Spark

I'm trying to get the SQLContext instance from one module in another module. The first module instantiates it to an implicit sqlContext and I had (erroneously) thought that I could then use an implicit parameter in the second module, but the compiler informs me that:
could not find implicit value for parameter sqlCtxt: org.apache.spark.sql.SQLContext
Here's the skeletal setup I have (I have elided imports and details):
-----
// Application.scala
-----
package apps
object Application extends App {
val env = new SparkEnvironment("My app", ...)
try {
// Call methods from various packages that use code from internally DFExtensions.scala
}
}
-----
// SparkEnvironment.scala
-----
package common
class SparkEnvironment(val app: String, ...) {
#transient lazy val conf: SparkConf = new SparkConf().setAppName(app)
#transient implicit lazy val sc: SparkContext = new SparkContext(conf)
#transient implicit lazy val sqlContext: SQLContext = new SQLContext(sc)
...
}
-----
// DFExtensions.scala
-----
package util
object DFExtensions {
private def myFun(...)(implicit sqlCtxt: SQLContext) = { ... }
implicit final class DFExt(val df: DataFrame) extends AnyVal {
// Extension methods for DataFrame where myFun is supposed to be used -- causes exception!
}
}
Since it's a multi-project sbt setup I don't want to pass around the instance env to all related objects because the stuff in util is really a shared library. Each sub-project (i.e. app) has its own instance created in the main method.
Because myFun is only called from the implicit class DFExt I thought about creating an implicit just before each call à la implicit val sqlCtxt = df.sqlContext and that compiles but it's kind of ugly and I would not need the implicit in SparkEnvironment any longer.
According to this discussion the implicit sqlContext instance is not in scope, hence compilation fails. I'm not sure a package object would work because the implicit value and parameter are in different packages.
Is what I'm trying to achieve even possible? Is there a better alternative?
The idea is to have several sub-projects that use the same libraries and core functions to share the same project. They are typically updated together, so it's nice to have them in a single place. Most of the library functions directly work on data frames and other structures in Spark, but occasionally I need to do something that requires an instance of SparkContext or SQLContext, for instance write a query with sqlContext.sql as some syntax is not yet natively supported (e.g. flattening with outer lateral views).
Each sub-project has its own main method that creates an implicit instance. Obviously the libraries do not 'know' about this as they are in different packages and I don't pass around the instances. I had thought that somehow implicits are looked for at runtime, so that when an application runs there is an instance of SQLContext defined as an implicit. It's possible that a) it's not in scope because it's in a different package or b) what I'm trying to do is just a bad idea.
Currently there is only one main method because I first have to split the application in multiple components, which I have not done yet.
Just in case it helps:
Spark 1.4.1
Scala 2.10
sbt 0.13.8
Because myFun is only called from the implicit class DFExt I thought about creating an implicit just before each call à la implicit val sqlCtxt = df.sqlContext and that compiles but it's kind of ugly and I would not need the implicit in SparkEnvironment any longer.
Just put the implicit and myFun inside DFExt:
implicit final class DFExt(val df: DataFrame) extends AnyVal {
private implicit def sqlCtxt: SqlContext = df.sqlContext
// no need to take an implicit parameter, as sqlCtxt is already in scope
private def myFun(...) = ...
// The extension methods can now use sqlCtxt and/or myFun freely
}
You could also make sqlCtxt a val, but then: 1) DFExt can't extend AnyVal anymore; 2) it needs to be initialized even if the extension method you call doesn't need it; 3) any calls to sqlCtxt are likely to be inlined, so you are just accessing a val from df instead of this anyway. If they aren't, this means you are using it far too little to matter.

Scala 2.10 reflection: ClassSymbol.isCaseClass works in scala console but not in script/app

I am playing around with reflection in Scala 2.10.0-M7 and stumbled upon the ClassSymbol.isCaseClass method which behaves like expected in the scala console but not when executed as a java application or as a scala script.
I've defined TestScript.scala like this:
import reflect.runtime.currentMirror
case class TestCase(foo: String)
object Test {
def main(args: Array[String]) {
val classSymbol = currentMirror.reflect(new TestCase("foo")).symbol
val isCaseClass = classSymbol.isCaseClass
println(s"isCaseClass: $isCaseClass")
}
}
Test.main(Array())
If I execute it on the command line calling
$ scala TestScript.scala
I get this output:
isCaseClass: false
If I instead input the code into the interactive scala shell or load it like this:
scala> :load TestScript.scala
I get the following correct output:
Loading TestScript.scala...
import reflect.runtime.currentMirror
defined class TestCase
defined module Test
isCaseClass: true
If I compile it and execute it as a standard Java app I get false as result for ClassSymbol.isCase again.
What am I missing? What are the differences between the scala console environment and the java runtime environment? How can I get the correct result in a real application?
https://issues.scala-lang.org/browse/SI-6277
val classSymbol = cm.reflect(new TestCase("foo")).symbol
{ classSymbol.typeSignature }
val isCaseClass = classSymbol.isCaseClass
println(s"isCaseClass: $isCaseClass")
Edit: to answer your last question, you wouldn't be using a milestone in a real application. :)
Upd. Fixed since Scala 2.10.0-RC1.