I have a case class for configuration parameters which is populated (using NO external library) before starting the actual application.
I pass this config object through out the application and in too many places.
Now the question is can this object be made global so I can refer it across the application as the values are going to be constant.
case class ConfigParam() extends Serializable {
var JobId: Int = 0
var jobName: String = null
var snapshotDate: Date = null
}
val configParam = ???
val ss = getSparkSession(configParam) //Method call...
Using ConfigParam as a global object could have bad implications for you. First of all, it will make harder to test any function which is using that global object.
Maybe you could just pass ConfigParam as an implicit argument?
For example, let's say you've got 3 functions:
def funA(name: String)(implicit configParam: ConfigParam): String = ???
def funB(number: Int)(implicit configParam: ConfigParam): String = ???
//you don't have to explicitily pass config param to funA or funB
def funC(name: String)(implicit configParam: ConfigParam): String = funA(name) + funB(100)
implicit val configParam = ??? //you need to initialise configParams as implicit val
funC("somename") //you can now just call funC without explicitly passing configParam
//it will be also passed to all function calls inside funC
//as long as they've got implicit parameter list with ConfigParam
Another solution could be to use some kind of dependency-injection framework, like guice.
Related
I have a working Scala application in production that is using a object which has several methods defined inside it.
There are new requirements for this application where I will have to rewrite (override) few of the methods from that object while reusing the definitions of remaining methods from that object.
How can I create a new object inheriting the original one so that I can override the definitions of a few selected methods?
A Scala object cannot inherit from another Scala object so the obvious way is not possible.
If you can modify the original object then create a class that implements all the functionality and make the original object inherit from that class. Your new object can then inherit from the same class and override the methods that you want to change. However this will create two copies of any values in the base class, so it is not suitable for an object that contains a lot of data or does any one-off initialisation.
If you cannot modify the original object then you will have to copy all the methods of the first object in your new object. vals can be copied directly. defs can be copied using eta expansion:
def v = Original.v // This is a simple value
def f = Original.f _ // This is a function value
Using def rather than val here will avoid storing multiple copies of the original values and will prevent lazy values from being computed until they are needed.
Using eta expansion will make f a function value rather than a method which may or may not be a problem depending on how it is used. If you require f to be a method then you will have to duplicate the function signature and call the original f:
def f(i: Int) = Original.f(i) // This is a method
My suggestion would be to move the code/logic to a trait or abstract class and have both objects extend these.
On the upside this would also give you better testability.
Another more hacky approach could be to not use the class/type system at all and jsut forward the methods using a new singleton objec:
scala> object A {def foo: String = "foo" ; def bar:Int = 0}
defined object A
scala> object B { def foo = A.foo; def bar = "my new impl" }
defined object B
scala> A.foo
res3: String = foo
scala> B.foo
res4: String = foo
scala> A.bar
res5: Int = 0
scala> B.bar
res6: String = my new impl
Given the following String:
"println(\"Hello\")"
It is possible to use reflection to evaluate the code, as follows.
object Eval {
def apply[A](string: String): A = {
val toolbox = currentMirror.mkToolBox()
val tree = toolbox.parse(string)
toolbox.eval(tree).asInstanceOf[A]
}
}
However, lets say that the string contains an object with a function definition, such as:
"""object MyObj { def getX="X"}"""
Is there a way to use Scala reflection to compile the string, load it and run the function? What I have tried to do has not worked, if anyone has some example code it is very appreciated.
It depends on how strictly you define the acceptable input string. Should the object always be called MyObj? Should the method always be called getX? Should it always be 1 method or can it be multiple?
For the more general case you could try to extract all method names from the AST and generate calls to each one. The following code will call every method (and returns the result of the last one) that takes 0 arguments and is not a constructor, in some object, not taking inheritance into account:
def eval(string: String): Any = {
val toolbox = currentMirror.mkToolBox()
val tree = toolbox.parse(string)
//oName is the name of the object
//defs is the list of all method definitions
val ModuleDef(_,oName,Template(_,_,defs)) = tree
//only generate calls for non-constructor, zero-arg methods
val defCalls = defs.collect{
case DefDef(_,name,_,params,_,_)
if name != termNames.CONSTRUCTOR && params.flatten.isEmpty => q"$oName.$name"
}
//put the method calls after the object definition
val block = tree :: defCalls
toolbox.eval(q"..$block")
}
And using it:
scala> eval("""object MyObj { def bar() = println("bar"); def foo(a: String) = println(a); def getX = "x" }""")
bar
res60: Any = x
I have to append this column generated by the method 'strToInt' which is turning out to be not serializable.
def strToInt(colVal : String) : Int = {
var str = new Array[String](3)
str(0) = "icmp"; str(1) = "tcp"; str(2) = "udp"
var i = 0
for (i <- 0 to str.length-1) {
if (str(i) == colVal) { return i }
}
throw new IllegalStateException("This never happens")
}
val strtoint = udf(strToInt(_:String)).apply(col("Atr 1"))
val newDF = df.withColumn("newCol", strtoint)
I have tried putting the function in a helper class this way,
object Helper extends Serializable {
def strToInt ...
}
but it doesn't help.
Change your code to be as follows where the function execution is at withColumn level (not when the UDF is defined).
// define a UDF
val strtoint = udf(strToInt _)
// use it (aka execute)
val newDF = df.withColumn("newCol", strtoint(col("Atr 1")))
That seemingly little change changes what you create and how you execute it afterwards.
As you may have noticed already, udf creates a user-defined function that Spark SQL understands (can can execute):
udf[RT, A1](f: (A1) ⇒ RT): UserDefinedFunction Defines a user-defined function of 1 arguments as user-defined function (UDF).
(I removed the implicit parameters to ease comprehension)
Quoting the scaladoc of UserDefinedFunction:
A user-defined function. To create one, use the udf functions in functions.
Not much I agree, but the "protocol" is to register a UDF first before you can execute it in your queries, say withColumn or select operators.
I'd also change strToInt to be more Scala-idiomatic (and hopefully easier to comprehend, too).
def strToInt(colVal : String) : Int = {
val strs = Array("icmp", "tcp", "udp")
strs.indexOf(colVal)
}
The key to understanding what's going on here is that while Scala is a functional programming language, it runs on the JVM which does not have support for a functional type. At runtime, any val assigned an "anonymous" or "lambda" function will actually be an instance of an anonymous class with an apply method. So let's say you have the following:
object helper {
val isNegative: (Int => Boolean) = (n: Int) => n < 0
}
This compiles to the same thing as this:
object helper {
val isNegative: Function1[Int, Boolean] = {
def apply(n: Int): Boolean = n < 0
}
}
isNegative is really an anonymous class instance extending the trait Function1. When you instead do this:
object helper {
def isNegative(n: Int): Boolean = n < 0
}
Now isNegative is a method of the object helper instead. When it comes to dealing with Spark, if you were to do something like this:
// ds is a Dataset[Int]
ds.filter(isNegative)
In the first case Spark will have to serialize the anonymous class assigned to isNegative and fail because it is not serializable. In the second case, it will have to serialize helper which does work because an object is serializable if all it's state is serializable.
To apply this to your problem, when you do this:
val strtoint = udf(strToInt(_:String)).apply(col("Atr 1"))
at runtime what strtoint is is an anonymous class instance with the trait Funtion1[String, UserDefinedFunction], that is a method that generates a UserDefinedFunction when it is a called. With the underscore filled in, it is identical to this:
val strtoInt: Function1[String, UserDefinedFunction] = new Function1[String, UserDefinedFunction] = {
def apply(t1: String) = udf(strToInt(t1 :String)).apply(col("Atr 1"))
}
to minimally change you code, you can just change the val to a def:
def sti = udf(strToInt(_:String)).apply(col("Atr 1"))
Now sti is a member function of it's enclosing class, and if that is serializable, you should be good as far as Spark is concerned. The other thing to keep in mind here is that strToInt also needs to be part of a serializable class or object
The other way to fix this as has been suggested would be to change val strtoint to a UserDefinedFunction which is a case class and thus serializable, however you still need to make sure that strToInt is a member of a serializable class or object.
This problem seems to be similar to the problem I was experiencing (In Java).
My udf function was using Cipher library to encrypt something and the exception that was thrown is :
Caused by: java.io.NotSerializableException: javax.crypto.Cipher
Serialization stack:
- object not serializable (class: javax.crypto.Cipher, value: javax.crypto.Cipher#625d02ce)
I could not add 'implements Serializable' to Cipher class because it was a library provided by Java.
I used the following solution from this link : spark-how-to-call-udf-over-dataset-in-java
private static UDF1 toUpper = new UDF1<String, String>() {
public String call(final String str) throws Exception {
return str.toUpperCase();
}
};
Register the UDF and you can use callUDF function.
import static org.apache.spark.sql.functions.callUDF;
import static org.apache.spark.sql.functions.col;
sqlContext.udf().register("toUpper", toUpper, DataTypes.StringType);
peopleDF.select(col("name"),callUDF("toUpper", col("name"))).show();
Where instead of calling str.toUpperCase(); I called my Cipher instance.
Is it possible to add a member variable to a class from outside the class? (Or mimic this behavior?)
Here's an example of what I'm trying to do. I already use an implicit conversion to add additional functions to RDD, so I added a variable to ExtendedRDDFunctions. I'm guessing this doesn't work because the variable is lost after the conversion in a rdd.setMember(string) call.
Is there any way to get this kind of functionality? Is this the wrong approach?
implicit def toExtendedRDDFunctions(rdd: RDD[Map[String, String]]): ExtendedRDDFunctions = {
new ExtendedRDDFunctions(rdd)
}
class ExtendedRDDFunctions(rdd: RDD[Map[String, String]]) extends Logging with Serializable {
var member: Option[String] = None
def getMember(): String = {
if (member.isDefined) {
return member.get
} else {
return ""
}
}
def setMember(field: String): Unit = {
member = Some(field)
}
def queryForResult(query: String): String = {
// Uses member here
}
}
EDIT:
I am using these functions as follows: I first call rdd.setMember("state"), then rdd.queryForResult(expression).
Because the implicit conversion is applied each time you invoke a method defined in ExtendedRDDFunctions, there is a new instance of ExtendedRDDFunctions created for every call to setMember and queryForResult. Those instances do not share any member variables.
You have basically two options:
Maintain a Map[RDD, String] in ExtendedRDDFunctions's companion object which you use to assign the member value to an RDD in setMember. This is the evil option as you introduce global state and open pitfalls for a whole range of errors.
Create a wrapper class that contains your member value and is returned by the setMember method:
case class RDDWithMember(rdd: RDD[Map[String, String]], member: String) extends RDD[Map[String, String]] {
def queryForResult(query: String): String = {
// Uses member here
}
// methods of the RDD interface, just delegate to rdd
}
implicit class ExtendedRDDFunctions(rdd: RDD[Map[String, String]]) {
def setMember(field: String): RDDWithMember = {
RDDWithMember(rdd, field)
}
}
Beside the omitted global state, this approach is also more type safe because you cannot call queryForResult on instances that do not have a member. The only downsides are that you have to delegate all members of RDD and that queryForResult is not defined on RDD itself.
The first issue can probably be addressed with some macro magic (search for "delegate" or "proxy" and "macro").
The later issue can be resolved by defining an additional extension method in ExtendedRDDFunctions that checks if the RDD is a RDDWithMember:
implicit class ExtendedRDDFunctions(rdd: RDD[Map[String, String]]) {
def setMember(field: String): RDDWithMember = // ...
def queryForResult(query: String): Option[String] = rdd match {
case wm: RDDWithMember => Some(wm.queryForResult(query))
case _ => None
}
}
import ExtendedRDDFunctions._
will import all attributes and functions from Companion object to be used in the body of your class.
For your usage look for delagate pattern.
I've got a Scala def that takes parameters from an HTTP POST and parses the data. I'm pulling a "job" object from the database (the query was successful as verified in the debugger, and parameters are just as they need to be) and I'm trying to update that job object with the new parameters. However, trying to assign values are proving useless since the job object retains all original values.
All database objects are from Squeryl. Code below:
Edit: added class below and Job object to help give context in this Play! app
object Job {
def updateFromParams(params:Params) = {
val job = Job.get( params.get("job_id").toLong ).get
val comments = params.get("comments")
val startTime = parseDateTime(params.get("start_time") + " " + params.get("date"))
val endTime = parseDateTime(params.get("end_time") + " " + params.get("date"))
val clientId = params.get("client_id").toLong
val client = Client.get(clientId).get
val name = params.get("job_name")
val startAddressType = params.get("start_address_type")
var startLocationId:Option[Long] = None
val (startAddress, startCity, startProvince) = startAddressType match {
case "client" => getClientAddress(clientId)
case "custom" => (params.get("start_custom_address"),
params.get("start_custom_city"),
params.get("start_custom_province"))
case id => {
startLocationId = Some(id.toLong)
getLocationAddress(startLocationId.get)
}
}
job.comments -> comments
job.startTime -> startTime
job.endTime -> endTime
job.clientId -> clientId
job.name -> name
job.startAddressType -> startAddressType
job.startAddress -> startAddress
job.startCity -> startCity
job.startProvince -> startProvince
Job.update(job)
}
}
I'm stumped because if I try job.name -> name nothing happens and if I try job.name = name then I get a Scala reassignment to val error. I get the same error when trying var name instead of val name.
It's obviously a syntax issue on my part, what's the proper way to handle this? Thanks!
More Info: if this helps, here's the Job class used in our Play! app:
class Job(
val id: Long,
#Column("name")
val name: String,
#Column("end_time")
val endTime: Timestamp,
#Column("start_time")
val startTime: Timestamp,
#Column("client_id")
val clientId: Long,
#Column("start_address_type")
var startAddressType:String,
#Column("start_address")
var startAddress: String,
/* LOTS MORE LIKE THIS */
) extends KeyedEntity[Long] {
}
job.name is an immutable property, so you cannot change its value with job.name = name. You can see in the definition of the Job class that name is declared with val, meaning its value is immutable and can never be changed. The only way to "change" the values of the job object is to actually create a totally new instance and discard the old one. This is standard practice when dealing with immutable objects.
Changing your local name from val to var won't matter, since you are only reading the value of that variable.
val are immutable, in fat the whole Job class is immutable (since all fields are).
What could be done is to create a case class JobW and a bit of pimping to allow the use of copy. That said:
class Job(val id:Long, val name:String) {}
case class JobW(override val id:Long, override val name:String) extends Job(id, name){
def ok:String = name + id
}
implicit def wrapJob(job:Job):JobW = JobW(job.id, job.name)
val job:Job = new Job(2L, "blah")
println(job.ok)
println(job.copy(name="Blob"))
What I've done, is to wrap a (spimplified for the exercise) Job into a case class wrapper, and define the implicit conversion.
Using this implicit conversion (what is called pimping), you'll have access to the ok method but also the copy one.
The copy method is an injected one on case classes, that takes as much arguments as the case class as fields and produces a new instance of the case class.
So you have now the ability to change only one value of you class, very simply I mean, and retrieve an new object (as functional programming arges for immutability).