I am quite new to the scala programming language, and I currently need to do the following. I have a signleton object like the following:
object MyObject extends Serializable {
val map: HashMap[String, Int] = null
val x: int = -1;
val foo: String = ""
}
Now i want to avoid to have to serialize each field of this object separately, thus I was considering writing the whole object to a file, and then, in the next execution of the program, read the file and initialize the singleton object from there. Is there any way to do this?
Basically what I want is when the serialization file doesn't exist, those variables to be initialized to new structures, while when it exists, the fields to be initialized from the ones on the file. But I want to avoid having to serialize/deserialize every field manually...
UPDATE:
I had to use a custom deserializer as presented here: https://issues.scala-lang.org/browse/SI-2403, since i had issues with a custom class I use inside the HashMap as values.
UPDATE2:
Here is the code I use to serialize:
val store = new ObjectOutputStream(new FileOutputStream(new File("foo")))
store.writeObject(MyData)
store.close
And the code to deserialize (in a different file):
#transient private lazy val loadedData: MyTrait = {
if(new File("foo").exists()) {
val in = new ObjectInputStream(new FileInputStream("foo")) {
override def resolveClass(desc: java.io.ObjectStreamClass): Class[_] = {
try { Class.forName(desc.getName, false, getClass.getClassLoader) }
catch { case ex: ClassNotFoundException => super.resolveClass(desc) }
}
}
val obj = in.readObject().asInstanceOf[MyTrait]
in.close
obj
}
else null
}
Thanks,
No needs to serialize an object with only immutable fields (because the compiler will do it for you...) I will assume that the object provides default values. Here is a way to do this:
Start by writing an trait with all the required fields:
trait MyTrait {
def map: HashMap[String, Int]
def x: Int
def foo: String
}
Then write an object with the defaults:
object MyDefaults extends MyTrait {
val map = Map()
val x = -1
val foo =
}
Finally write an implementation unserializing data if it exists:
object MyData extends MyTrait {
private lazy val loadedData: Option[MyTrait] = {
if( /* filename exists */ ) Some( /*unserialize filename as MyTrait*/)
else None
}
lazy val map = loadedData.getOrElse( MyDefault ).map
lazy val x = loadedData.getOrElse( MyDefault ).x
lazy val foo = loadedData.getOrElse( MyDefault ).foo
}
Related
I have an Enumeration in Scala
object Status extends Enumeration {
type Status = Value
val Success = Value
val Error = Value
}
This is used in the below -
case class Response(
status: Status,
errorMessage: String
)
I want to store Response in a file. So, I am using Jackson object mapper (com.fasterxml.jackson.databind.ObjectMapper ) to serialize it.
writeOutputToFile(filePath: Path , objectMapper.writeValueAsString(response))
However, object mapper writes an empty json to the file. I know object mapper requires a getter method to serialize. Is that why this is failing? Would I need a custom object mapper?
You can define generic serializer and deserializer for all enums and then for each of them register corresponding pairs of instances:
class EnumSerializer[T <: scala.Enumeration](e: T) extends JsonSerializer[T#Value] {
override def serialize(x: T#Value, jg: JsonGenerator, spro: SerializerProvider): Unit =
jg.writeString(x.toString)
}
class EnumDeserializer[T <: scala.Enumeration](e: T) extends JsonDeserializer[T#Value] {
private[this] val ec = new ConcurrentHashMap[String, T#Value]
override def deserialize(jp: JsonParser, ctxt: DeserializationContext): T#Value = Try {
val s = jp.getValueAsString
var x = ec.get(s)
if (x eq null) {
x = e.values.iterator.find(_.toString == s).get
ec.put(s, x)
}
x
}.getOrElse(ctxt.handleUnexpectedToken(classOf[T#Value], jp).asInstanceOf[T#Value])
}
val objectMapper: ObjectMapper with ScalaObjectMapper = {
val jsonFactory = new JsonFactoryBuilder()
.configure(...)
.build()
new ObjectMapper(jsonFactory) with ScalaObjectMapper {
registerModule(DefaultScalaModule)
registerModule(new SimpleModule()
.addSerializer(classOf[Status], new EnumSerializer(Status))
.addDeserializer(classOf[Status] new EnumDeserializer(Status))
...
)
}
}
The proposed solution is much safe and more efficient in runtime than just using of Status.withName(s) for each deserialization.
It will work even for the dynamic definition of enum values, like:
object Status extends Enumeration {
type Status = Value
val Success = Value
val Error = Value
def extra(name: String): Status = Value(nextId, name)
}
Status.extra("Unknown")
Full sources are here.
I'm looking for a way to convert a Scala singleton object given as a string (for example: package1.Main) to the actual instance of Main, so that I can invoke methods on it.
Example of the problem:
package x {
object Main extends App {
val objectPath: String = io.StdIn.readLine("Give an object: ") // user enters: x.B
// how to convert the objectPath (String) to a variable that references singleton B?
val b1: A = magicallyConvert1(objectPath)
b1.hi()
val b2: B.type = magicallyConvert2(objectPath)
b2.extra()
}
trait A {
def hi() = {}
}
object B extends A {
def extra() = {}
}
}
How can the magicallyConvert1 and magicallyConvert2 functions be implemented?
For a normal class, this can be done using something like:
val b: A = Class.forName("x.B").newInstance().asInstanceOf[A]
But I found a solution for singletons, using Java reflections:
A singleton is accesible in Java under the name:
package.SingletonName$.MODULE$
So you have to append "$.MODULE$", which is a static field.
So we can use standard Java reflections to get it.
So the solution is:
def magicallyConvert1(objectPath: String) = {
val clz = Class.forName(objectPath + "$")
val field = clz.getField("MODULE$")
val b: A = field.get(null).asInstanceOf[A]
b
}
def magicallyConvert2(objectPath: String) = {
val clz = Class.forName(objectPath + "$")
val field = clz.getField("MODULE$")
val b: B.type = field.get(null).asInstanceOf[B.type]
b
}
But it would be interesting to still see a solution with Scala-Reflect en Scala-Meta.
take a look at scalameta http://scalameta.org it does what you want and more
I am broadcasting a value in Spark Streaming application . But I am not sure how to access that variable in a different class than the class where it was broadcasted.
My code looks as follows:
object AppMain{
def main(args: Array[String]){
//...
val broadcastA = sc.broadcast(a)
//..
lines.foreachRDD(rdd => {
val obj = AppObject1
rdd.filter(p => obj.apply(p))
rdd.count
}
}
object AppObject1: Boolean{
def apply(str: String){
AnotherObject.process(str)
}
}
object AnotherObject{
// I want to use broadcast variable in this object
val B = broadcastA.Value // compilation error here
def process(): Boolean{
//need to use B inside this method
}
}
Can anyone suggest how to access broadcast variable in this case?
There is nothing particularly Spark specific here ignoring possible serialization issues. If you want to use some object it has to be available in the current scope and you can achieve this the same way as usual:
you can define your helpers in a scope where broadcast is already defined:
{
...
val x = sc.broadcast(1)
object Foo {
def foo = x.value
}
...
}
you can use it as a constructor argument:
case class Foo(x: org.apache.spark.broadcast.Broadcast[Int]) {
def foo = x.value
}
...
Foo(sc.broadcast(1)).foo
method argument
case class Foo() {
def foo(x: org.apache.spark.broadcast.Broadcast[Int]) = x.value
}
...
Foo().foo(sc.broadcast(1))
or even mixed-in your helpers like this:
trait Foo {
val x: org.apache.spark.broadcast.Broadcast[Int]
def foo = x.value
}
object Main extends Foo {
val sc = new SparkContext("local", "test", new SparkConf())
val x = sc.broadcast(1)
def main(args: Array[String]) {
sc.parallelize(Seq(None)).map(_ => foo).first
sc.stop
}
}
Just a short take on performance considerations that were introduced earlier.
Options proposed by zero233 are indeed very elegant way of doing this kind of things in Scala. At the same time it is important to understand implications of using certain patters in distributed system.
It is not the best idea to use mixin approach / any logic that uses enclosing class state. Whenever you use a state of enclosing class within lambdas Spark will have to serialize outer object. This is not always true but you'd better off writing safer code than one day accidentally blow up the whole cluster.
Being aware of this, I would personally go for explicit argument passing to the methods as this would not result in outer class serialization (method argument approach).
you can use classes and pass the broadcast variable to classes
your psudo code should look like :
object AppMain{
def main(args: Array[String]){
//...
val broadcastA = sc.broadcast(a)
//..
lines.foreach(rdd => {
val obj = new AppObject1(broadcastA)
rdd.filter(p => obj.apply(p))
rdd.count
})
}
}
class AppObject1(bc : Broadcast[String]){
val anotherObject = new AnotherObject(bc)
def apply(str: String): Boolean ={
anotherObject.process(str)
}
}
class AnotherObject(bc : Broadcast[String]){
// I want to use broadcast variable in this object
def process(str : String): Boolean = {
val a = bc.value
true
//need to use B inside this method
}
}
Is there a way to share a variable among all objects (instantiated from the same type)? Consider the following simple program. Two objects name and name2 have the same type A. Is there way to connect the properyList inside the two instantiation name and name2?
class A {
var properyList = List[String]()
def printProperties(): Unit = {
println(properyList)
}
}
object Experiment {
def main(args: Array[String]): Unit = {
val name = new A
val name2 = new A
name.properyList = List("a property")
name.printProperties()
name2.printProperties()
}
}
The output is
List(a property)
List()
Any way to change the class definition so that by just changing the .properyList in one of the objects, it is changed in all of the instatiations?
What you seem to be looking for is a class variable. Before I get into why you should avoid this, let me explain how you can do it:
You can attach propertyList to the companion object instead of the class:
object A {
var properyList = List[String]()
}
class A {
def printProperties(): Unit = {
println(A.properyList)
}
}
Now, to the why you shouldn't:
While scala let's you do pretty much anything that the JVM is capable of, its aims are to encourage a functional programming style, which generally eschews mutable state, especially shared, mutable state. I.e. the anti-pattern in A is not only that propertyList is a var, not a val but by sharing it via the companion object, you further allow anyone, from any thread to change the state of all instances at anytime.
The benefit of declaring your data as val is that you can safely pass it around, since you can be sure that nobody can change from under you at any time in the future.
You seem to be looking for something like java static fields.
In scala you usually achieve something like that by using a companion object:
object Main extends App {
class A {
import A._
def printProperties(): Unit = {
println(properyList)
}
}
object A {
private var properyList = List[String]()
def addProperty(prop: String): Unit = {
properyList ::= prop
}
}
val name = new A
val name2 = new A
A.addProperty("a property")
name.printProperties()
name2.printProperties()
}
If you want to have something similar to java's static fields you will have to use companion objects.
object Foo {
private var counter = 0
private def increment = {
counter += 1;
counter
}
}
class Foo {
val i = Foo.increment
println(i)
}
Code copied from:
"Static" field in Scala companion object
http://daily-scala.blogspot.com/2009/09/companion-object.html
Based on Arne Claassen's answer, but using private mutable collection with the companion object, which makes it visible only to the companion classes. Very simplistic example tried out in scala 2.11.7 console:
scala> :paste
// Entering paste mode (ctrl-D to finish)
object A {
private val mp = scala.collection.mutable.Map("a"->1)
}
class A {
def addToMap(key:String, value:Int) = { A.mp += (key -> value) }
def getValue(key:String) = A.mp.get(key)
}
// Exiting paste mode, now interpreting.
defined object A
defined class A
// create a class instance, verify it can access private map in object
scala> val a = new A
a: A = A#6fddee1d
scala> a.getValue("a")
res1: Option[Int] = Some(1)
// create another instance and use it to change the map
scala> val b = new A
b: A = A#5e36f335
scala> b.addToMap("b", 2)
res2: scala.collection.mutable.Map[String,Int] = Map(b -> 2, a -> 1)
// verify that we cannot access the map directly
scala> A.mp // this will fail
<console>:12: error: value mp is not a member of object A
A.mp
^
// verify that the previously created instance sees the updated map
scala> a.getValue("b")
res4: Option[Int] = Some(2)
I wrote some Scala code, using reflection, that returns all vals in an object that are of a certain type. Below are three versions of this code. One of them works but is ugly. Two attempts to improve it don't work, in very different ways. Can you explain why?
First, the code:
import scala.reflect.runtime._
import scala.util.Try
trait ScopeBase[T] {
// this version tries to generalize the type. The only difference
// from the working version is [T] instead of [String]
def enumerateBase[S: universe.TypeTag]: Seq[T] = {
val mirror = currentMirror.reflect(this)
universe.typeOf[S].decls.map {
decl => Try(mirror.reflectField(decl.asMethod).get.asInstanceOf[T])
}.filter(_.isSuccess).map(_.get).filter(_ != null).toSeq
}
}
trait ScopeString extends ScopeBase[String] {
// This version works but requires passing the val type
// (String, in this example) explicitly. I don't want to
// duplicate the code for different val types.
def enumerate[S: universe.TypeTag]: Seq[String] = {
val mirror = currentMirror.reflect(this)
universe.typeOf[S].decls.map {
decl => Try(mirror.reflectField(decl.asMethod).get.asInstanceOf[String])
}.filter(_.isSuccess).map(_.get).filter(_ != null).toSeq
}
// This version tries to avoid passing the object's type
// as the [S] type parameter. After all, the method is called
// on the object itself; so why pass the type?
def enumerateThis: Seq[String] = {
val mirror = currentMirror.reflect(this)
universe.typeOf[this.type].decls.map {
decl => Try(mirror.reflectField(decl.asMethod).get.asInstanceOf[String])
}.filter(_.isSuccess).map(_.get).filter(_ != null).toSeq
}
}
// The working example
object Test1 extends ScopeString {
val IntField: Int = 13
val StringField: String = "test"
lazy val fields = enumerate[Test1.type]
}
// This shows how the attempt to generalize the type doesn't work
object Test2 extends ScopeString {
val IntField: Int = 13
val StringField: String = "test"
lazy val fields = enumerateBase[Test2.type]
}
// This shows how the attempt to drop the object's type doesn't work
object Test3 extends ScopeString {
val IntField: Int = 13
val StringField: String = "test"
lazy val fields = enumerateThis
}
val test1 = Test1.fields // List(test)
val test2 = Test2.fields // List(13, test)
val test3 = Test3.fields // List()
The "enumerate" method does work. However, as you can see from the Test1 example, it requires passing the object's own type (Test1.type) as a parameter, which should not have been necessary. The "enumerateThis" method tries to avoid that but fails, producing an empty list. The "enumerateBase" method attempts to generalize the "enumerate" code by passing the val type as a parameter. But it fails, too, producing the list of all vals, not just those of a certain type.
Any idea what's going on?
Your problem in your generic implementation is the loss of the type information of T. Also, don't use exceptions as your primary method of control logic (it's very slow!). Here's a working version of your base.
abstract class ScopeBase[T : universe.TypeTag, S <: ScopeBase[T, S] : universe.TypeTag : scala.reflect.ClassTag] {
self: S =>
def enumerateBase: Seq[T] = {
val mirror = currentMirror.reflect(this)
universe.typeOf[S].baseClasses.map(_.asType.toType).flatMap(
_.decls
.filter(_.typeSignature.resultType <:< universe.typeOf[T])
.filter(_.isMethod)
.map(_.asMethod)
.filter(_.isAccessor)
.map(decl => mirror.reflectMethod(decl).apply().asInstanceOf[T])
.filter(_ != null)
).toSeq
}
}
trait Inherit {
val StringField2: String = "test2"
}
class Test1 extends ScopeBase[String, Test1] with Inherit {
val IntField: Int = 13
val StringField: String = "test"
lazy val fields = enumerateBase
}
object Test extends App {
println(new Test1().fields)
}
Instead of getting the type from universe.typeOf you can use the runtime class currentMirror.classSymbol(getClass).toType, below is an example that works:
def enumerateThis: Seq[String] = {
val mirror = currentMirror.reflect(this)
currentMirror.classSymbol(getClass).toType.decls.map {
decl => Try(mirror.reflectField(decl.asMethod).get.asInstanceOf[String])
}.filter(_.isSuccess).map(_.get).filter(_ != null).toSeq
}
//prints List(test)
With everyone's help, here's the final version that works:
import scala.reflect.runtime.{currentMirror, universe}
abstract class ScopeBase[T: universe.TypeTag] {
lazy val enumerate: Seq[T] = {
val mirror = currentMirror.reflect(this)
currentMirror.classSymbol(getClass).baseClasses.map(_.asType.toType).flatMap {
_.decls
.filter(_.typeSignature.resultType <:< universe.typeOf[T])
.filter(_.isMethod)
.map(_.asMethod)
.filterNot(_.isConstructor)
.filter(_.paramLists.size == 0)
.map(decl => mirror.reflectField(decl.asMethod).get.asInstanceOf[T])
.filter(_ != null).toSeq
}
}
}
trait FieldScope extends ScopeBase[Field[_]]
trait DbFieldScope extends ScopeBase[DbField[_, _]] {
// etc....
}
As you see from the last few lines, my use cases are limited to scope objects for specific field types. This is why I want to parameterize the scope container. If I wanted to enumerate the fields of multiple types in a single scope container, then I would have parameterized the enumerate method.