Access Spark broadcast variable in different classes - scala

I am broadcasting a value in Spark Streaming application . But I am not sure how to access that variable in a different class than the class where it was broadcasted.
My code looks as follows:
object AppMain{
def main(args: Array[String]){
//...
val broadcastA = sc.broadcast(a)
//..
lines.foreachRDD(rdd => {
val obj = AppObject1
rdd.filter(p => obj.apply(p))
rdd.count
}
}
object AppObject1: Boolean{
def apply(str: String){
AnotherObject.process(str)
}
}
object AnotherObject{
// I want to use broadcast variable in this object
val B = broadcastA.Value // compilation error here
def process(): Boolean{
//need to use B inside this method
}
}
Can anyone suggest how to access broadcast variable in this case?

There is nothing particularly Spark specific here ignoring possible serialization issues. If you want to use some object it has to be available in the current scope and you can achieve this the same way as usual:
you can define your helpers in a scope where broadcast is already defined:
{
...
val x = sc.broadcast(1)
object Foo {
def foo = x.value
}
...
}
you can use it as a constructor argument:
case class Foo(x: org.apache.spark.broadcast.Broadcast[Int]) {
def foo = x.value
}
...
Foo(sc.broadcast(1)).foo
method argument
case class Foo() {
def foo(x: org.apache.spark.broadcast.Broadcast[Int]) = x.value
}
...
Foo().foo(sc.broadcast(1))
or even mixed-in your helpers like this:
trait Foo {
val x: org.apache.spark.broadcast.Broadcast[Int]
def foo = x.value
}
object Main extends Foo {
val sc = new SparkContext("local", "test", new SparkConf())
val x = sc.broadcast(1)
def main(args: Array[String]) {
sc.parallelize(Seq(None)).map(_ => foo).first
sc.stop
}
}

Just a short take on performance considerations that were introduced earlier.
Options proposed by zero233 are indeed very elegant way of doing this kind of things in Scala. At the same time it is important to understand implications of using certain patters in distributed system.
It is not the best idea to use mixin approach / any logic that uses enclosing class state. Whenever you use a state of enclosing class within lambdas Spark will have to serialize outer object. This is not always true but you'd better off writing safer code than one day accidentally blow up the whole cluster.
Being aware of this, I would personally go for explicit argument passing to the methods as this would not result in outer class serialization (method argument approach).

you can use classes and pass the broadcast variable to classes
your psudo code should look like :
object AppMain{
def main(args: Array[String]){
//...
val broadcastA = sc.broadcast(a)
//..
lines.foreach(rdd => {
val obj = new AppObject1(broadcastA)
rdd.filter(p => obj.apply(p))
rdd.count
})
}
}
class AppObject1(bc : Broadcast[String]){
val anotherObject = new AnotherObject(bc)
def apply(str: String): Boolean ={
anotherObject.process(str)
}
}
class AnotherObject(bc : Broadcast[String]){
// I want to use broadcast variable in this object
def process(str : String): Boolean = {
val a = bc.value
true
//need to use B inside this method
}
}

Related

Is it possible to declare a val before assignment/initialization in Scala?

In general, can you declare a val in scala before assigning it a value? If not, why not? An example where this might be useful (in my case at least) is that I want to declare a val which will be available in a larger scope than when I assign it. If I cannot do this, how can I achieve the desired behavior?
And I want this to be a val, not a var because after it is assigned, it should NEVER change, so a var isn't ideal.
For example:
object SomeObject {
val theValIWantToDeclare // I don't have enough info to assign it here
def main(): Unit = {
theValIWantToDeclare = "some value"
}
def someOtherFunc(): Unit {
val blah = someOperationWith(theValIWantToDeclare)
}
}
object SomeObject {
private var tviwtdPromise: Option[Int] = None
lazy val theValIWantToDeclare: Int = tviwtdPromise.get
private def declareTheVal(v: Int): Unit = {
tviwtdPromise = Some(v)
theValIWantToDeclare
}
def main(args: Array[String]): Unit = {
declareTheVal(42)
someOtherFunction()
}
def someOtherFunction(): Unit = {
println(theValIWantToDeclare)
}
}
It will crash with a NoSuchElementException if you try to use theValIWantToDeclare before fulfilling the "promise" with declareTheVal.
It sounds to me that you need a lazy val.
A lazy val is populated on demand and the result is cached for all subsequent calls.
https://blog.codecentric.de/en/2016/02/lazy-vals-scala-look-hood/
Why not define a SomeObjectPartial that is partially constructed, and class SomeObject(theVal) that takes the value as a parameter?
Then your program has two states, one with the partial object, and another with the completed object.

Sharing variables among objects in Scala

Is there a way to share a variable among all objects (instantiated from the same type)? Consider the following simple program. Two objects name and name2 have the same type A. Is there way to connect the properyList inside the two instantiation name and name2?
class A {
var properyList = List[String]()
def printProperties(): Unit = {
println(properyList)
}
}
object Experiment {
def main(args: Array[String]): Unit = {
val name = new A
val name2 = new A
name.properyList = List("a property")
name.printProperties()
name2.printProperties()
}
}
The output is
List(a property)
List()
Any way to change the class definition so that by just changing the .properyList in one of the objects, it is changed in all of the instatiations?
What you seem to be looking for is a class variable. Before I get into why you should avoid this, let me explain how you can do it:
You can attach propertyList to the companion object instead of the class:
object A {
var properyList = List[String]()
}
class A {
def printProperties(): Unit = {
println(A.properyList)
}
}
Now, to the why you shouldn't:
While scala let's you do pretty much anything that the JVM is capable of, its aims are to encourage a functional programming style, which generally eschews mutable state, especially shared, mutable state. I.e. the anti-pattern in A is not only that propertyList is a var, not a val but by sharing it via the companion object, you further allow anyone, from any thread to change the state of all instances at anytime.
The benefit of declaring your data as val is that you can safely pass it around, since you can be sure that nobody can change from under you at any time in the future.
You seem to be looking for something like java static fields.
In scala you usually achieve something like that by using a companion object:
object Main extends App {
class A {
import A._
def printProperties(): Unit = {
println(properyList)
}
}
object A {
private var properyList = List[String]()
def addProperty(prop: String): Unit = {
properyList ::= prop
}
}
val name = new A
val name2 = new A
A.addProperty("a property")
name.printProperties()
name2.printProperties()
}
If you want to have something similar to java's static fields you will have to use companion objects.
object Foo {
private var counter = 0
private def increment = {
counter += 1;
counter
}
}
class Foo {
val i = Foo.increment
println(i)
}
Code copied from:
"Static" field in Scala companion object
http://daily-scala.blogspot.com/2009/09/companion-object.html
Based on Arne Claassen's answer, but using private mutable collection with the companion object, which makes it visible only to the companion classes. Very simplistic example tried out in scala 2.11.7 console:
scala> :paste
// Entering paste mode (ctrl-D to finish)
object A {
private val mp = scala.collection.mutable.Map("a"->1)
}
class A {
def addToMap(key:String, value:Int) = { A.mp += (key -> value) }
def getValue(key:String) = A.mp.get(key)
}
// Exiting paste mode, now interpreting.
defined object A
defined class A
// create a class instance, verify it can access private map in object
scala> val a = new A
a: A = A#6fddee1d
scala> a.getValue("a")
res1: Option[Int] = Some(1)
// create another instance and use it to change the map
scala> val b = new A
b: A = A#5e36f335
scala> b.addToMap("b", 2)
res2: scala.collection.mutable.Map[String,Int] = Map(b -> 2, a -> 1)
// verify that we cannot access the map directly
scala> A.mp // this will fail
<console>:12: error: value mp is not a member of object A
A.mp
^
// verify that the previously created instance sees the updated map
scala> a.getValue("b")
res4: Option[Int] = Some(2)

Dynamic object method invocation using reflection in scala

I'm looking to create a way to dynamically call logic depending on template id within scala. So template id 1 calls logic a, template id 2 call logic b, etc. The logic will be diverse but will have the same inputs/outputs. Also the number of different template ids will get into the thousands and will not be known ahead of time, so a loose coupling feels the way to go.
I've started looking at reflection to do this using scala 2.11.1 and can statically use reflection when I know the logic to be used ahead of time but have not found the correct way to dynamically use reflection, so for example passing in template id 2 will call logic b.
Below is a cut down example showing how the static version works and the skeleton I have so far for the dynamic version.
package thePackage
import scala.reflect.runtime.{universe => ru}
trait theTrait { def theMethod(x: String): Unit }
// the different logic held in different objects
object object1 extends theTrait {
def theMethod(x: String) = { println("a " + x ) }
}
object object2 extends theTrait {
def theMethod(x: String) = { println("b " + x ) }
}
object object3 extends theTrait {
def theMethod(x: String) = { println("c " + x ) }
}
// run static/dynamic reflection methods
object ReflectionTest {
// "static" invocation calling object1.theMethod
def staticInvocation() = {
val m = ru.runtimeMirror(getClass.getClassLoader)
val im = m.reflect(thePackage.object1)
val method = ru.typeOf[thePackage.object1.type]
.decl(ru.TermName("theMethod")).asMethod
val methodRun = im.reflectMethod(method)
methodRun("test")
}
staticInvocation
// "dynamic" invocation using integer to call different methods
def dynamicInvocation( y: Integer) = {
val m = ru.runtimeMirror(getClass.getClassLoader)
val module = m.staticModule("thePackage.object" + y)
val im = m.reflectModule(module)
// stuck... static approach does not work here
}
dynamicInvocation(1)
dynamicInvocation(2)
dynamicInvocation(3)
}
What needs to be added/changed to the dynamicInvocation method to make this work, or should I be using a different approach?
You need to get an instance mirror for your module, on which you can reflect the method.
def dynamicInvocation( y: Integer) = {
val m = ru.runtimeMirror(getClass.getClassLoader)
val module = m.staticModule("thePackage.object" + y)
val im = m.reflectModule(module)
val method = im.symbol.info.decl(ru.TermName("theMethod")).asMethod
val objMirror = m.reflect(im.instance)
objMirror.reflectMethod(method)("test")
}
It seems that TermName method in above solution has been replaced by newTermName and also the info.decl seems to not work. Below line worked for me
val method = im.symbol.typeSignature.member(ru.newTermName("testMethod")).asMethod

Scala Pickling: Writing a custom pickler / unpickler for nested structures

I'm trying to write a custom SPickler / Unpickler pair to work around some the current limitations of scala-pickling.
The data type I'm trying to pickle is a case class, where some of the fields already have their own SPickler and Unpickler instances.
I'd like to use these instances in my custom pickler, but I don't know how.
Here's an example of what I mean:
// Here's a class for which I want a custom SPickler / Unpickler.
// One of its fields can already be pickled, so I'd like to reuse that logic.
case class MyClass[A: SPickler: Unpickler: FastTypeTag](myString: String, a: A)
// Here's my custom pickler.
class MyClassPickler[A: SPickler: Unpickler: FastTypeTag](
implicit val format: PickleFormat) extends SPickler[MyClass[A]] with Unpickler[MyClass[A]] {
override def pickle(
picklee: MyClass[A],
builder: PBuilder) {
builder.beginEntry(picklee)
// Here we save `myString` in some custom way.
builder.putField(
"mySpecialPickler",
b => b.hintTag(FastTypeTag.ScalaString).beginEntry(
picklee.myString).endEntry())
// Now we need to save `a`, which has an implicit SPickler.
// But how do we use it?
builder.endEntry()
}
override def unpickle(
tag: => FastTypeTag[_],
reader: PReader): MyClass[A] = {
reader.beginEntry()
// First we read the string.
val myString = reader.readField("mySpecialPickler").unpickle[String]
// Now we need to read `a`, which has an implicit Unpickler.
// But how do we use it?
val a: A = ???
reader.endEntry()
MyClass(myString, a)
}
}
I would really appreciate a working example.
Thanks!
Here is a working example:
case class MyClass[A](myString: String, a: A)
Note that the type parameter of MyClass does not need context bounds. Only the custom pickler class needs the corresponding implicits:
class MyClassPickler[A](implicit val format: PickleFormat, aTypeTag: FastTypeTag[A],
aPickler: SPickler[A], aUnpickler: Unpickler[A])
extends SPickler[MyClass[A]] with Unpickler[MyClass[A]] {
private val stringUnpickler = implicitly[Unpickler[String]]
override def pickle(picklee: MyClass[A], builder: PBuilder) = {
builder.beginEntry(picklee)
builder.putField("myString",
b => b.hintTag(FastTypeTag.ScalaString).beginEntry(picklee.myString).endEntry()
)
builder.putField("a",
b => {
b.hintTag(aTypeTag)
aPickler.pickle(picklee.a, b)
}
)
builder.endEntry()
}
override def unpickle(tag: => FastTypeTag[_], reader: PReader): MyClass[A] = {
reader.hintTag(FastTypeTag.ScalaString)
val tag = reader.beginEntry()
val myStringUnpickled = stringUnpickler.unpickle(tag, reader).asInstanceOf[String]
reader.endEntry()
reader.hintTag(aTypeTag)
val aTag = reader.beginEntry()
val aUnpickled = aUnpickler.unpickle(aTag, reader).asInstanceOf[A]
reader.endEntry()
MyClass(myStringUnpickled, aUnpickled)
}
}
In addition to the custom pickler class, we also need an implicit def which returns a pickler instance specialized for concrete type arguments:
implicit def myClassPickler[A: SPickler: Unpickler: FastTypeTag](implicit pf: PickleFormat) =
new MyClassPickler

Scala serialization/deserialization of singleton object

I am quite new to the scala programming language, and I currently need to do the following. I have a signleton object like the following:
object MyObject extends Serializable {
val map: HashMap[String, Int] = null
val x: int = -1;
val foo: String = ""
}
Now i want to avoid to have to serialize each field of this object separately, thus I was considering writing the whole object to a file, and then, in the next execution of the program, read the file and initialize the singleton object from there. Is there any way to do this?
Basically what I want is when the serialization file doesn't exist, those variables to be initialized to new structures, while when it exists, the fields to be initialized from the ones on the file. But I want to avoid having to serialize/deserialize every field manually...
UPDATE:
I had to use a custom deserializer as presented here: https://issues.scala-lang.org/browse/SI-2403, since i had issues with a custom class I use inside the HashMap as values.
UPDATE2:
Here is the code I use to serialize:
val store = new ObjectOutputStream(new FileOutputStream(new File("foo")))
store.writeObject(MyData)
store.close
And the code to deserialize (in a different file):
#transient private lazy val loadedData: MyTrait = {
if(new File("foo").exists()) {
val in = new ObjectInputStream(new FileInputStream("foo")) {
override def resolveClass(desc: java.io.ObjectStreamClass): Class[_] = {
try { Class.forName(desc.getName, false, getClass.getClassLoader) }
catch { case ex: ClassNotFoundException => super.resolveClass(desc) }
}
}
val obj = in.readObject().asInstanceOf[MyTrait]
in.close
obj
}
else null
}
Thanks,
No needs to serialize an object with only immutable fields (because the compiler will do it for you...) I will assume that the object provides default values. Here is a way to do this:
Start by writing an trait with all the required fields:
trait MyTrait {
def map: HashMap[String, Int]
def x: Int
def foo: String
}
Then write an object with the defaults:
object MyDefaults extends MyTrait {
val map = Map()
val x = -1
val foo =
}
Finally write an implementation unserializing data if it exists:
object MyData extends MyTrait {
private lazy val loadedData: Option[MyTrait] = {
if( /* filename exists */ ) Some( /*unserialize filename as MyTrait*/)
else None
}
lazy val map = loadedData.getOrElse( MyDefault ).map
lazy val x = loadedData.getOrElse( MyDefault ).x
lazy val foo = loadedData.getOrElse( MyDefault ).foo
}