How store methods vals without recreating them every method call - scala

I have Scala class which methods use a lot of regex. Each class method use some regex patterns.
Looking from the perspective of code modularity I should store those patterns in method:
class Bar {
def foo() {
val patt1 = "[ab]+".r
val patt2 = "[cd]+".r
/*...*/
}
}
But this approach is quite inefficient. Patterns are recompiled on each method call.
I could move them directly to class:
class Bar {
val fooPatt1 = "[ab]+".r
val fooPatt2 = "[cd]+".r
/*...*/
}
but in case when I have 30 methods it looks ugly.
I ended up with some hybrid solution using val and anonymous function:
val z = {
val patt1 = "[ab]+".r
val patt2 = "[cd]+".r
() => { /* ... */ }
}
but I am not sure if using val to store function have some drawbacks compared to def. Maybe there is other clean solution to store methods constants without polluting the class?

Using a val is perfectly fine. There might be a (very) small performance hit, but in most (99.9%) of the applications that's not a problem.
You could also create a class for the method
// The extends is not needed, although you might want to hide the Foo type
class Foo extends (() => ...) {
val patt1 = "[ab]+".r
val patt2 = "[cd]+".r
def apply() = {
...
}
}
Then in the class:
class Bar {
val foo = new Foo
}
Another solution is using traits
trait Foo {
private lazy val patt1 = "[ab]+".r
private lazy val patt2 = "[cd]+".r
def foo() = ...
}
class Bar extends Foo with ...
Note that if you have different methods like that in a single class, it can be sign that the single responsibility principle is violated. Moving them to their own class (or trait) can be a solution for that problem as well.

I would put every method with the necessary regex in it's own Trait:
class Bar extends AMethod with BMethod
trait AMethod {
private val aPattern = """\d+""".r
def aMethod(s: String) = aPattern.findFirstIn(s)
}
trait BMethod {
private val bPattern = """\w+""".r
def bMethod(s: String) = bPattern.findFirstIn(s)
}
clean
separated
easy to test (object AMethodSpec extends Properties("AMethod") with AMethod ...)

I took into account Chris comment. Putting patterns to companion object is probably the most efficient approach but very unclean when we have more methods.
EECOLOR solution is less efficient but cleaner. Traits prevents recreating patterns on each method call. Unfortunately, scala do not use same compiled pattern accross multiple class instances:
(new X).patt1==(new X).patt1 // would be false.
I've combined those two approaches and instead traits I used objects.
object X {
object method1 {
val patt1 = "a".r
}
object method2 {
val patt1 = "a".r
}
}
class X {
def method1 = {
import X.method1._
patt1
}
def method2 = {
import X.method2._
patt1
}
}
(new X).method1 == (new X).method1 // true
(new X).method2 == (new X).method2 // true
Although this approach works, I think scala should provide some solution for that problem out of box. Patterns are the simplest example. We could have other immutable objects which initialization is much more expensive.
Extracting method internals somewhere outside is still unclear. It would be nice to do it like with lazy vals. Adding one modificator should ensure that value is instance only once across all instances and methods calls. It would be something like that:
def method1 {
static val x = new VeryExpensiveObject
}

Related

Understanding closures or best way to take udf registrations' code out of main and put in utils

This is more of a Scala concept doubt than Spark. I have this Spark initialization code :
object EntryPoint {
val spark = SparkFactory.createSparkSession(...
val funcsSingleton = ContextSingleton[CustomFunctions] { new CustomFunctions(Some(hashConf)) }
lazy val funcs = funcsSingleton.get
//this part I want moved to another place since there are many many UDFs
spark.udf.register("funcName", udf {funcName _ })
}
The other class, CustomFunctions looks like this
class CustomFunctions(val hashConfig: Option[HashConfig], sark: Option[SparkSession] = None) {
val funcUdf = udf { funcName _ }
def funcName(colValue: String) = withDefinedOpt(hashConfig) { c =>
...}
}
^ class is wrapped in Serializable interface using ContextSingleton which is defined like so
class ContextSingleton[T: ClassTag](constructor: => T) extends AnyRef with Serializable {
val uuid = UUID.randomUUID.toString
#transient private lazy val instance = ContextSingleton.pool.synchronized {
ContextSingleton.pool.getOrElseUpdate(uuid, constructor)
}
def get = instance.asInstanceOf[T]
}
object ContextSingleton {
private val pool = new TrieMap[String, Any]()
def apply[T: ClassTag](constructor: => T): ContextSingleton[T] = new ContextSingleton[T](constructor)
def poolSize: Int = pool.size
def poolClear(): Unit = pool.clear()
}
Now to my problem, I want to not have to explicitly register the udfs as done in the EntryPoint app. I create all udfs as needed in my CustomFunctions class and then register dynamically only the ones that I read from user provided config. What would be the best way to achieve it? Also, I want to register the required udfs outside the main app but that throws my the infamous TaskNotSerializable exception. Serializing the big CustomFunctions is not a good idea, hence wrapped it up in ContextSingleton but my problem of registering udfs outside cannot be solved that way. Please suggest the right approach.

Scala fails to initialize a val

I have found kind of a weirdness in the following Scala program (sorry to include all the code, but you'll see why I added it all) :
object md2html extends App {
private val DEFAULT_THEME = Themes.AMAZON_LIGHT
private val VALID_OPTIONS = Set("editorTheme", "logo", "style")
try {
// some code 1
} catch {
case t: Throwable => t.printStackTrace(); exitWithError(t.getMessage)
}
// some code 2 (method definitions only)
private def parseOption(key: String, value: String) = {
println(key + " " + VALID_OPTIONS)
if (! Set("theme","editorTheme", "logo", "style").contains(key)) exitWithError(s"$key is not a valid option")
if (key == "theme") Themes(value).toMap else Map(key.drop(2) -> value)
}
// some code 3 (method definitions only)
}
If VALID_OPTIONS is defined after one of the some code..., it is evaluated to null in parseOption. I can see no good reason for that. I truncated the code for clarity, but if some more code is required I'll be happy to add it.
EDIT : I looked a bit more into it, and here is what I found.
When extending App, the val is not initialized with this code
object Test extends App {
printTest()
def printTest = println(test)
val test = "test"
}
With a regular main method, it works fine :
object Test {
def main(args: Array[String]): Unit = {
printTest
}
def printTest = println(test)
val test = "test"
}
I had overseen that you use extends App. This is another pitfall in Scala, unfortunately:
object Foo extends App {
val bar = "bar"
}
Foo.bar // null!
Foo.main(Array())
Foo.bar // now initialized
The App trait defers the object's initialization to the invocation of the main method, so all the vals are null until the main method has been called.
In summary, the App trait and vals do not mix well. I have fallen into that trap many times. If you use App, avoid vals, if you have to use global state, use lazy vals instead.
Constructor bodies, and this goes for singleton objects as well, are evaluated strictly top to bottom. This is a common pitfall in Scala, unfortunately, as it becomes relevant where the vals are defined if they are referenced in other places of the constructor.
object Foo {
val rab = useBar // oops, involuntarily referring to uninitialized val
val bar = "bar"
def useBar: String = bar.reverse
}
Foo // NPE
Of course, in a better world, the Scala compiler would either disallow the above code, re-order the initialization, or at least warn you. But it doesn't...

Why is "lazy" a keyword rather than a standard-library type?

Scala keeps a lot of very useful constructs like Option and Try in its standard library.
Why is lazy given special treatment by having its own keyword when languages such as C#, which lacks afore mentioned types, choose to implement it as a library feature?
It is true that you could define a lazy value for example like this:
object Lazy {
def apply[A](init: => A): Lazy[A] = new Lazy[A] {
private var value = null.asInstanceOf[A]
#volatile private var initialized = false
override def toString =
if (initialized) value.toString else "<lazy>#" + hashCode.toHexString
def apply(): A = {
if (!initialized) this.synchronized {
if (!initialized) {
value = init
initialized = true
}
}
value
}
}
implicit def unwrap[A](l: Lazy[A]): A = l()
}
trait Lazy[+A] { def apply(): A }
Usage:
val x = Lazy {
println("aqui")
42
}
def test(i: Int) = i * i
test(x)
On the other hand, having lazy as a language provided modifier has the advantage of allowing it to participate in the uniform access principle. I tried to look up a blog entry for it, but there isn't any that goes beyond getters and setters. This principle is actually more fundamental. For values, the following are unified: val, lazy val, def, var, object:
trait Foo[A] {
def bar: A
}
class FooVal[A](val bar: A) extends Foo[A]
class FooLazyVal[A](init: => A) extends Foo[A] {
lazy val bar: A = init
}
class FooVar[A](var bar: A) extends Foo[A]
class FooProxy[A](peer: Foo[A]) extends Foo[A] {
def bar: A = peer.bar
}
trait Bar {
def baz: Int
}
class FooObject extends Foo[Bar] {
object bar extends Bar {
val baz = 42
}
}
Lazy values were introduced in Scala 2.6. There is a Lambda the Ultimate comment which suggests that the reasoning might have to do with formalising the possibility to have cyclic references:
Cyclic dependencies require binding with lazy values. Lazy values can also be used to enforce that component initialization occurs in dependency order. Component shutdown order, sadly, must be coded by hand
I do not know why cyclic references could not be automatically handled by the compiler; perhaps there were reasons of complexity or performance penality. A blog post by Iulian Dragos confirms some of these assumptions.
The current lazy implementation uses an int bitmask to track if a field has been initialized, and no other memory overhead. This field is shared between multiple lazy vals (up to 32 lazy vals per field). It would be impossible to implement the feature with a similar memory efficiency as a library feature.
Lazy as a library would probably look roughly like this:
class LazyVal[T](f: =>T) {
#volatile private var initialized = false
/*
this does not need to be volatile since there will always be an access to the
volatile field initialized before this is read.
*/
private var value:T = _
def apply() = {
if(!initialized) {
synchronized {
if(!initialized) {
value = f
initialized = true
}
}
}
value
}
}
The overhead of this would be an object for the closure f that generates the value, and another object for the LazyVal itself. So it would be substantial for a feature that is used as often as this.
On the CLR you have value types, so the overhead is not as bad if you implement your LazyVal as a struct in C#
However, now that macros are available, it might be a good idea to turn lazy into a library feature or at least allow to customize the lazy initialiation. Many use cases of lazy val do not require thread synchronization, so it is wasteful to have the #volatile/synchronized overhead every time you use lazy val.

Is there an easy way to chain java setters that are void instead of return this

I have a bunch of auto-generated java code that I will be calling in scala. Currently all of the objects were generated with void setters instead of returning this which makes it really annoying when you need to set a bunch of values (I'm not going to use the constructor by initializing everything since there's like 50 fields). For example:
val o = new Obj()
o.setA("a")
o.setB("b")
o.setC("c")
It would be really cool if I could do something like this
val o = with(new Obj()) {
_.setA("a")
_.setB("b")
_.setC("c")
}
I can't use andThen with anon functions since they require objects to be returned. Am I stuck with the current way I'm doing things or is there some magic I'm not aware of.
Sure, you can use tap (the Kestrel combinator), which you presently have to define yourself:
implicit class Tapper[A](val a: A) extends AnyVal {
def tap[B](f: A => B): A = { f(a); a }
def taps[B](fs: A => B*): A = { fs.map(_(a)); a }
}
It works like so:
scala> "salmon".taps(
| println,
| println
| )
salmon
salmon
res2: String = salmon
Note also
val myFavoriteObject = {
val x = new Obj
x.setA("a")
}
will allow you to use a short name to do all the setting while assigning to a more meaningful name for longer-term use.
You can use an implicit converter from/to a wrapper class that allows chaining.
Something like:
case class ObjWrapper(o: Obj) {
def setA(a: String) = { o.setA(a); this }
def setB(b: String) = { o.setB(b); this }
def setC(c: String) = { o.setC(c); this }
}
implicit def wrapped2Obj(ow: ObjWrapper): Obj = ow.o
ObjWrapper(myObj).setA("a").setB("b").setC("c")
Actually you don't even need the implicit converter since those method have been called on myObj.
Take a look at Scalaxy/Beans. Note however that it's using macros, so it should be considered experimental.

Scala - construction order and early definition syntax

I'm trying to learn Scala and thought I would begin by reading "Scala for the Impatient". There he cites the problem of construction order by using the following classes:
class Animal {
val range: Int = 10
val env: Array[Int] = new Array[Int](range)
}
class Ant extends Animal {
override val range: Int = 2
}
and then he explained why the env ends up being an empty Array[Int] and proceeds to explain ways to prevent that, including the early definition syntax.
But... can't I prevent that just by doing this:
class Animal(val range: Int = 10) {
val env: Array[Int] = new Array[Int](range)
/* do animal stuff */
}
class Ant(override val range: Int = 2) extends Animal(range) {
/* do ant stuff */
}
??? Why is the early definition syntax really necessary?
I think a better way to look at the need for early instantiation comes from mixing in traits. With traits, you won't have a constructor that you can tweak to get around this kind of issue. Consider this very trivial and completely unrealistic example:
trait Foo{
val bar:String
val barLength = bar.length()
}
object MyFoo extends Foo{
val bar = "test"
}
As it stands right now, this code will throw a NullPointerException when MyFoo is created because bar will not yet be defined when bar.length() is invoked. But if you used early initialization, and redefined MyFoo as:
object MyFoo extends {val bar = "test"} with Foo{
}
then everything works just fine.