Avro serialization with generic type issue - scala

I need to write a function in Scala that returns an Array of byte serializated with AvroOutputStream, but in scala i can't get the class of the generic object i'm passing in input.
Here is my util class:
class AvroUtils {
def createByteArray[T](obj: T): Array[Byte] = {
val byteArrayStream = new ByteArrayOutputStream()
val output = AvroOutputStream.binary[T](byteArrayStream)
output.write(obj)
output.close()
byteArrayStream.toByteArray()
}
}
As you can see if tou test this code is that AvroOutputStream can't recognize the T class so it can't generate a schema for it.
Hope you can help! thanks
PS: Already tried with TypeTag and ClassTag, nothing works.

You need to add the proper implicits for T, namely SchemaFor and ToRecord:
def createByteArray[T : SchemaFor : ToRecord](obj: T): Array[Byte] = {
val byteArrayStream = new ByteArrayOutputStream()
val output = AvroOutputStream.binary[T](byteArrayStream)
output.write(obj)
output.close()
byteArrayStream.toByteArray()
}

Related

Scala failing to resolve a type mismatch problem in a complicated code

I have the following setup and I am trying to pass the compile type checking of the code with preferably minimal modification as the code is being generated by a tool and not by hand.
I think the problem is I need to come up with a better definition for T_MAX_LATTICE[T] or T_IntegerMaxLattice in M_TEST_COLL.
The code is kind of large so I can't post the whole code here but I put the repo URL at the bottom. I am struggling to visualize the type hierarchy.
I know the question is too general but all I am looking for is being able to compile the code without using uncheck cast (or asInstanceOf)
type T_MAX_LATTICE[T] = T;
trait C_TEST_COLL[T_Result, T_T] extends C_TYPE[T_Result] with C_TINY[T_Result] {
type T_IntegerMaxLattice;
val t_IntegerMaxLattice : C_TYPE[T_IntegerMaxLattice] with C_MAX_LATTICE[T_IntegerMaxLattice,T_Integer];
type T_Integers;
val t_Integers : C_TYPE[T_Integers]with C_SET[T_Integers,T_Integer];
class M_TEST_COLL[T_T](name : String,val t_T : C_TYPE[T_T] with C_TINY[T_T])
extends Module(name)
with C_TEST_COLL[T_T,T_T]
{
val t_Result : this.type = this;
val t_IntegerMaxLattice = new M_MAX_LATTICE[T_Integer]("IntegerMaxLattice",t_Integer,0);
type T_IntegerMaxLattice = T_MAX_LATTICE[T_Integer];
The error I am getting:
Error:Error:line (42)type mismatch;
found : M_MAX_LATTICE[basic_implicit.T_Integer]
(which expands to) M_MAX_LATTICE[Int]
required: C_TYPE[M_TEST_COLL.this.T_IntegerMaxLattice] with C_MAX_LATTICE[M_TEST_COLL.this.T_IntegerMaxLattice,basic_implicit.T_Integer]
(which expands to) C_TYPE[Int] with C_MAX_LATTICE[Int,Int]
val t_IntegerMaxLattice = new M_MAX_LATTICE[T_Integer]("IntegerMaxLattice",t_Integer,0);
Repo url
I guess I created minimal example
type T_MAX_LATTICE[T] = T;
trait C_TEST_COLL[T_Result, T_T] extends C_TYPE[T_Result] with C_TINY[T_Result] {
type T_IntegerMaxLattice;
val t_IntegerMaxLattice: C_TYPE[T_IntegerMaxLattice] with C_MAX_LATTICE[T_IntegerMaxLattice, T_Integer];
type T_Integers;
val t_Integers: C_TYPE[T_Integers] with C_SET[T_Integers, T_Integer];
}
class M_TEST_COLL[T_T](name : String,val t_T : C_TYPE[T_T] with C_TINY[T_T])
extends Module(name)
with C_TEST_COLL[T_T,T_T] {
val t_Result: this.type = this;
val t_IntegerMaxLattice = new M_MAX_LATTICE[T_Integer]("IntegerMaxLattice", /*t_Integer,*/ 0);
type T_IntegerMaxLattice = T_MAX_LATTICE[T_Integer];
val t_Integers = ???/*new M_SET[T_Integer]("Integers",t_Integer);*/
type T_Integers /*= /*TI*/T_SET[T_Integer];*/
}
trait C_TYPE[T_Result] /*extends C_BASIC[T_Result] with C_PRINTABLE[T_Result]*/
trait C_TINY[T_Result] extends C_TYPE[T_Result]
trait C_MAX_LATTICE[T_Result, T_TO] /*extends C_MAKE_LATTICE[T_Result,T_TO]*/
type T_Integer = Int
// val t_Integer = new M_INTEGER("Integer")
trait C_SET[T_Result, T_ElemType] extends C_TYPE[T_Result] /*with C_COMPARABLE[T_Result] with C_COLLECTION[T_Result,T_ElemType] with C_ABSTRACT_SET[T_Result,T_ElemType] with C_COMBINABLE[T_Result]*/
class Module(val mname : String)
class M_MAX_LATTICE[T_TO]
(name : String, /*t_TO:C_ORDERED[T_TO],*/v_min_element : T_TO)
/*extends M_MAKE_LATTICE[T_TO](name,t_TO,v_min_element,
new M__basic_3[ T_TO](t_TO).v__op_z,
new M__basic_3[ T_TO](t_TO).v__op_z0,
new M__basic_13[ T_TO](t_TO).v_max,
new M__basic_13[ T_TO](t_TO).v_min)
with C_MAX_LATTICE[T_TO,T_TO] with C_ORDERED[T_TO]*/
I guess compile error is clear. You try to assign new M_MAX_LATTICE[T_Integer]... of type M_MAX_LATTICE[Int] to t_IntegerMaxLattice overriding a value of a different type.
If you make class M_MAX_LATTICE extend trait C_TYPE your code seems to compile
class M_MAX_LATTICE[T_TO]
(name : String, t_TO:C_ORDERED[T_TO],v_min_element : T_TO)
extends M_MAKE_LATTICE[T_TO](name,t_TO,v_min_element,
new M__basic_3[ T_TO](t_TO).v__op_z,
new M__basic_3[ T_TO](t_TO).v__op_z0,
new M__basic_13[ T_TO](t_TO).v_max,
new M__basic_13[ T_TO](t_TO).v_min)
with C_MAX_LATTICE[T_TO,T_TO] with C_ORDERED[T_TO]
with C_TYPE[T_TO] //added
{
val v_less = t_TO.v_less;
val v_less_equal = t_TO.v_less_equal;
val v_assert: T_TO => Unit = ??? //added
val v_node_equivalent: (T_TO, T_TO) => T_OrLattice = ??? //added
val v_string: T_TO => String = ??? //added
}

DSL Like Syntax in Scala

I'm trying to come up with a CSV Parser that can be called like this:
parser parse "/path/to/csv/file" using parserConfiguration
Where the parser will be a class that contains the target case class into which the CSV file will be parsed into:
class CSVParser[A] {
def parse(path: String) = Source.fromFile(fromFilePath).getLines().mkString("\n")
def using(cfg: ParserConfig) = ??? How do I chain this optionally???
}
val parser = CSVParser[SomeCaseClass]
I managed to get up to the point where I can call:
parser parse "/the/path/to/the/csv/file/"
But I do not want to run the parse method yet as I want to apply the configuration using the using like DSL as mentioned above! So there are two rules here. If the caller does not supply a parserConfig, I should be able to run with the default, but if the user supplies a parserConfig, I want to apply the config and then run the parse method. I tried it with a combination of implicits, but could not get them to work properly!
Any suggestions?
EDIT: So the solution looks like this as per comments from "Cyrille Corpet":
class CSVReader[A] {
def parse(path: String) = ReaderWithFile[A](path)
case class ReaderWithFile[A](path: String) {
def using(cfg: CSVParserConfig): Seq[A] = {
val lines = Source.fromFile(path).getLines().mkString("\n")
println(lines)
println(cfg)
null
}
}
object ReaderWithFile {
implicit def parser2parsed[A](parser: ReaderWithFile[A]): Seq[A] = parser.using(defaultParserCfg)
}
}
object CSVReader extends App {
def parser[A] = new CSVReader[A]
val sss: Seq[A] = parser parse "/csv-parser/test.csv" // assign this to a val so that the implicit conversion gets applied!! Very important to note!
}
I guess I need to get the implicit in scope at the location where I call the parser parse, but at the same time I do not want to mess up the structure that I have above!
If you replace using with an operator with a higher precedence than parse you can get it to work without needing extra type annotations. Take for instance <<:
object parsedsl {
class ParserConfig
object ParserConfig {
val default = new ParserConfig
}
case class ParseUnit(path: String, config: ParserConfig)
object ParseUnit {
implicit def path2PU(path: String) = ParseUnit(path, ParserConfig.default)
}
implicit class ConfigSyntax(path: String) {
def <<(config: ParserConfig) = ParseUnit(path, config)
}
class CSVParser {
def parse(pu: ParseUnit) = "parsing"
}
}
import parsedsl._
val parser = new CSVParser
parser parse "path" << ParserConfig.default
parser parse "path"
Your parse method should just give a partial result, without doing anything at all. To deal with default implem, you can use implicit conversion to output type:
class CSVParser[A] {
def parse(path: String) = ParserWithFile[A](path)
}
case class ParserWithFile[A](path: String) {
def using(cfg: ParserConfig): A = ???
}
object ParserWithFile {
implicit def parser2parsed[A](parser: ParserWithFile[A]): A = parser.using(ParserConfig.default)
}
val parser = CSVParser[SomeCaseClass]

Invoke a method on a generic type with scala and reflect package

My question is based on a search that I have made on the following pages (but I am still to new to scala to succeed in what I want to do):
reflection overview
The purpose of my code is to invoke a method from a generic type and not an instance of a known type.
The following demonstrate the idea:
class A {
def process = {
(1 to 1000).foreach(x => x + 10)
}
}
def getTypeTag[T: ru.TypeTag](obj: T) = ru.typeTag[T]
def perf[T: ru.TypeTag](t: T, sMethodName: String): Any = {
val m = ru.runtimeMirror(t.getClass.getClassLoader)
val myType = ru.typeTag[T].tpe
val mn = myType.declaration(ru.newTermName(sMethodName)).asMethod
val im = m.reflect(getTypeTag(t))
val toCall = im.reflectMethod(mn)
toCall()
}
val a = new A
perf(a, "process")
The code compile perfectly (on a worksheet) but give the following stack at execution:
scala.ScalaReflectionException: expected a member of class TypeTagImpl, you provided method A$A11.A$A11.A.process
at scala.reflect.runtime.JavaMirrors$JavaMirror.scala$reflect$runtime$JavaMirrors$JavaMirror$$ErrorNotMember(test-log4j.sc:126)
at scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$scala$reflect$runtime$JavaMirrors$JavaMirror$$checkMemberOf$1.apply(test-log4j.sc:221)
at scala.reflect.runtime.JavaMirrors$JavaMirror.ensuringNotFree(test-log4j.sc:210)
at scala.reflect.runtime.JavaMirrors$JavaMirror.scala$reflect$runtime$JavaMirrors$JavaMirror$$checkMemberOf(test-log4j.sc:220)
at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaInstanceMirror.reflectMethod(test-log4j.sc:257)
at scala.reflect.runtime.JavaMirrors$JavaMirror$JavaInstanceMirror.reflectMethod(test-log4j.sc:239)
at #worksheet#.perf(test-log4j.sc:20)
at #worksheet#.get$$instance$$res0(test-log4j.sc:28)
at #worksheet#.#worksheet#(test-log4j.sc:138)
Any idea about how to correct this ?
Many thanks to all
In order to reflect a particular object, you have to pass it to Mirror.reflect(obj: T), and you're passing its typeTag for some reason. To fix, you have to modify perf signature to generate a ClassTag along with a TypeTag, and pass t directly to reflect, like so:
class A {
def process = {
(1 to 1000).foreach(x => x + 10)
println("ok!")
}
}
def perf[T : ClassTag : ru.TypeTag](t: T, sMethodName: String): Any = {
// ^ modified here
val m = ru.runtimeMirror(t.getClass.getClassLoader)
val myType = ru.typeTag[T].tpe
val mn = myType.decl(ru.TermName(sMethodName)).asMethod
val im = m.reflect(t)
// ^ and here
val toCall = im.reflectMethod(mn)
toCall()
}
val a = new A
perf(a, "process")
// ok!
// res0: Any = ()
(Note: I also replaced deprecated declaration and newTermName with recommended alternatives)

Access Spark broadcast variable in different classes

I am broadcasting a value in Spark Streaming application . But I am not sure how to access that variable in a different class than the class where it was broadcasted.
My code looks as follows:
object AppMain{
def main(args: Array[String]){
//...
val broadcastA = sc.broadcast(a)
//..
lines.foreachRDD(rdd => {
val obj = AppObject1
rdd.filter(p => obj.apply(p))
rdd.count
}
}
object AppObject1: Boolean{
def apply(str: String){
AnotherObject.process(str)
}
}
object AnotherObject{
// I want to use broadcast variable in this object
val B = broadcastA.Value // compilation error here
def process(): Boolean{
//need to use B inside this method
}
}
Can anyone suggest how to access broadcast variable in this case?
There is nothing particularly Spark specific here ignoring possible serialization issues. If you want to use some object it has to be available in the current scope and you can achieve this the same way as usual:
you can define your helpers in a scope where broadcast is already defined:
{
...
val x = sc.broadcast(1)
object Foo {
def foo = x.value
}
...
}
you can use it as a constructor argument:
case class Foo(x: org.apache.spark.broadcast.Broadcast[Int]) {
def foo = x.value
}
...
Foo(sc.broadcast(1)).foo
method argument
case class Foo() {
def foo(x: org.apache.spark.broadcast.Broadcast[Int]) = x.value
}
...
Foo().foo(sc.broadcast(1))
or even mixed-in your helpers like this:
trait Foo {
val x: org.apache.spark.broadcast.Broadcast[Int]
def foo = x.value
}
object Main extends Foo {
val sc = new SparkContext("local", "test", new SparkConf())
val x = sc.broadcast(1)
def main(args: Array[String]) {
sc.parallelize(Seq(None)).map(_ => foo).first
sc.stop
}
}
Just a short take on performance considerations that were introduced earlier.
Options proposed by zero233 are indeed very elegant way of doing this kind of things in Scala. At the same time it is important to understand implications of using certain patters in distributed system.
It is not the best idea to use mixin approach / any logic that uses enclosing class state. Whenever you use a state of enclosing class within lambdas Spark will have to serialize outer object. This is not always true but you'd better off writing safer code than one day accidentally blow up the whole cluster.
Being aware of this, I would personally go for explicit argument passing to the methods as this would not result in outer class serialization (method argument approach).
you can use classes and pass the broadcast variable to classes
your psudo code should look like :
object AppMain{
def main(args: Array[String]){
//...
val broadcastA = sc.broadcast(a)
//..
lines.foreach(rdd => {
val obj = new AppObject1(broadcastA)
rdd.filter(p => obj.apply(p))
rdd.count
})
}
}
class AppObject1(bc : Broadcast[String]){
val anotherObject = new AnotherObject(bc)
def apply(str: String): Boolean ={
anotherObject.process(str)
}
}
class AnotherObject(bc : Broadcast[String]){
// I want to use broadcast variable in this object
def process(str : String): Boolean = {
val a = bc.value
true
//need to use B inside this method
}
}

Scala serialization/deserialization of singleton object

I am quite new to the scala programming language, and I currently need to do the following. I have a signleton object like the following:
object MyObject extends Serializable {
val map: HashMap[String, Int] = null
val x: int = -1;
val foo: String = ""
}
Now i want to avoid to have to serialize each field of this object separately, thus I was considering writing the whole object to a file, and then, in the next execution of the program, read the file and initialize the singleton object from there. Is there any way to do this?
Basically what I want is when the serialization file doesn't exist, those variables to be initialized to new structures, while when it exists, the fields to be initialized from the ones on the file. But I want to avoid having to serialize/deserialize every field manually...
UPDATE:
I had to use a custom deserializer as presented here: https://issues.scala-lang.org/browse/SI-2403, since i had issues with a custom class I use inside the HashMap as values.
UPDATE2:
Here is the code I use to serialize:
val store = new ObjectOutputStream(new FileOutputStream(new File("foo")))
store.writeObject(MyData)
store.close
And the code to deserialize (in a different file):
#transient private lazy val loadedData: MyTrait = {
if(new File("foo").exists()) {
val in = new ObjectInputStream(new FileInputStream("foo")) {
override def resolveClass(desc: java.io.ObjectStreamClass): Class[_] = {
try { Class.forName(desc.getName, false, getClass.getClassLoader) }
catch { case ex: ClassNotFoundException => super.resolveClass(desc) }
}
}
val obj = in.readObject().asInstanceOf[MyTrait]
in.close
obj
}
else null
}
Thanks,
No needs to serialize an object with only immutable fields (because the compiler will do it for you...) I will assume that the object provides default values. Here is a way to do this:
Start by writing an trait with all the required fields:
trait MyTrait {
def map: HashMap[String, Int]
def x: Int
def foo: String
}
Then write an object with the defaults:
object MyDefaults extends MyTrait {
val map = Map()
val x = -1
val foo =
}
Finally write an implementation unserializing data if it exists:
object MyData extends MyTrait {
private lazy val loadedData: Option[MyTrait] = {
if( /* filename exists */ ) Some( /*unserialize filename as MyTrait*/)
else None
}
lazy val map = loadedData.getOrElse( MyDefault ).map
lazy val x = loadedData.getOrElse( MyDefault ).x
lazy val foo = loadedData.getOrElse( MyDefault ).foo
}