best way to test Slick session/transaction code - scala

for creating datasource I have
object MyDataSource {
priavte lazy val dataSource: javax.sql.DataSource = {
val ds = new BasicDataSource
val conf = ConfigFactory.load()
val url = conf.getString("jdbc-url")
val driver = conf.getString("jdbc-driver")
val username = conf.getString("db-username")
val password = conf.getString("db-password")
val port = conf.getString("db-port")
val maxActive = conf.getInt("max-active")
val maxIdle = conf.getInt("max-idle")
val initSize = conf.getInt("init-size")
ds.setDriverClassName(driver)
ds.setUsername(username)
ds.setPassword(password)
ds.setMaxActive(maxActive)
ds.setMaxIdle(maxIdle)
ds.setInitialSize(initSize)
ds.setUrl(url)
ds
}
lazy val database = Database.forDataSource(dataSource)
}
MyDataSource is used as below
def insertCompany = {
MyDataSource.database.withSession{ implicit session =>
company.insert(companyRow)
}
}
Now for testing I have trait DatabaseSpec which loads test database(pointing to test db) config and has following fixture
def withSession(testCode: Session => Any) {
val session = postgres.createSession()
session.conn.setAutoCommit(false)
try {
testCode(session)
} finally {
session.rollback()
session.close()
}
}
And test code can then mix in DatabaseSpec and use withSession to test transactional code.
Now question is what's the best practice in keeping MyDataSource.database.withSession abstracted away from DataSource in insertCompany so that method can be tested with DatabaseSpec and pointing to test db?

The best way to be able to exchange a value, .e.g for prod and testing is by parameterizing your code in that value. E.g.
def insertCompany(db: Database) = db.withSession(company.insert(companyRow)(_))
or
class DAO(db:Database){
def insertCompany = db.withSession(company.insert(companyRow)(_))
}
Keep it simple. Avoid unnecessary complexity like the Cake pattern, DI frameworks or mixin composition for this.
If you need to pass multiple values around... aggregate them into a "config"-class. Compose multiple config classes with different purposes to target different things, if you want to avoid writing one huge config class as stuff accumulates.
If you find yourself passing config objects to all your functions, you can mark them as implicit, that saves you at least the call-site code overhead. Or you can use something like scalaz's monadic function composition to avoid call site and definition site code overhead for passing config around. It is sometimes called the Reader monad, but it is simply for-comprehension enabled composition of 1-argument functions.
Slick 2.2 will ship with something like that out-of-the-box and make what you want very easy.
Also, here is an approach I am currently playing around with, a composable configuration object "TMap". This code example shows step by step how you get from global imports over parameterized functions and making them implicit to using TMap and removing most boilerplate: https://github.com/cvogt/slick-action/blob/0.1/src/test/scala/org/cvogt/di/TMapTest.scala#L49

Related

flink parsing JSON in map: InvalidProgramException: Task not serializable

I am working a on Flink project and would like to parse the source JSON string data to Json Object. I am using jackson-module-scala for the JSON parsing. However, I encountered some issues with using the JSON parser within Flink APIs (map for example).
Here are some examples of the code, and I cannot understand the reason under the hood why it is behaving like this.
Situation 1:
In this case, I am doing what the jackson-module-scala's official exmaple code told me to do:
create a new ObjectMapper
register the DefaultScalaModule
DefaultScalaModule is a Scala object that includes support for all currently supported Scala data types.
call the readValue in order to parse the JSON to Map
The error I got is: org.apache.flink.api.common.InvalidProgramException:Task not serializable.
object JsonProcessing {
def main(args: Array[String]) {
// set up the execution environment
val env = StreamExecutionEnvironment.getExecutionEnvironment
// get input data
val text = env.readTextFile("xxx")
val mapper = new ObjectMapper
mapper.registerModule(DefaultScalaModule)
val counts = text.map(mapper.readValue(_, classOf[Map[String, String]]))
// execute and print result
counts.print()
env.execute("JsonProcessing")
}
}
Situation 2:
Then I did some Google, and came up with the following solution, where registerModule is moved into the map function.
val mapper = new ObjectMapper
val counts = text.map(l => {
mapper.registerModule(DefaultScalaModule)
mapper.readValue(l, classOf[Map[String, String]])
})
However, what I am not able to understand is: why this is going to work, with calling method of outside-defined object mapper? Is it because the ObjectMapper itself is Serializable as stated here ObjectMapper.java#L114?
Now, the JSON parsing is working fine, but every time, I have to call mapper.registerModule(DefaultScalaModule) which I think could cause some performance issue (Does it?). I also tried another solution as follows.
Situation 3:
I created a new case class Jsen, and use it as the corresponding parsing class, registering the Scala modules. And it is also working fine.
However, this is not so flexible if your input JSON is varying very often. It is not maintainable to manage the class Jsen.
case class Jsen(
#JsonProperty("a") a: String,
#JsonProperty("c") c: String,
#JsonProperty("e") e: String
)
object JsonProcessing {
def main(args: Array[String]) {
...
val mapper = new ObjectMapper
val counts = text.map(mapper.readValue(_, classOf[Jsen]))
...
}
Additionally, I also tried using JsonNode without calling registerModule as follows:
...
val mapper = new ObjectMapper
val counts = text.map(mapper.readValue(_, classOf[JsonNode]))
...
It is working fine as well.
My main question is: what is actually causing the problem of Task not serializable under the hood of registerModule(DefaultScalaModule)?
How to identify whether your code could potentially cause this unserializable problem during coding?
The thing is that Apache Flink is designed to be distributed. It means that it needs to be able to run your code remotely. So it means that all your processing functions should be serializable. In the current implementation this is ensure early on when you build your streaming process even if you will not run this in any distributed mode. This is a trade-off with an obvious benefit of providing you feedback down to the very line that breaks this contract (via exception stack trace).
So when you write
val counts = text.map(mapper.readValue(_, classOf[Map[String, String]]))
what you actually write is something like
val counts = text.map(new Function1[String, Map[String, String]] {
val capturedMapper = mapper
override def apply(param: String) = capturedMapper.readValue(param, classOf[Map[String, String]])
})
The important thing here is that you capture the mapper from the outside context and store it as a part of your Function1 object that has to be serializble. And this means that the mapper has to be serializable. The designers of Jackson library recognized that kind of a need and since there is nothing fundamentally non-serizliable in a mapper they made their ObjectMapper and the default Modules serializable. Unfortunately for you the designers of Scala Jackson Module missed that and made their DefaultScalaModule deeply non-serialiazable by making ScalaTypeModifier and all sub-classes non-serializable. This is why your second code works while the first one doesn't: "raw" ObjectMapper is serializable while ObjectMapper with pre-registered DefaultScalaModule is not.
There are a few possible workarounds. Probably the easiest one is to wrap ObjectMapper
object MapperWrapper extends java.io.Serializable {
// this lazy is the important trick here
// #transient adds some safety in current Scala (see also Update section)
#transient lazy val mapper = {
val mapper = new ObjectMapper
mapper.registerModule(DefaultScalaModule)
mapper
}
def readValue[T](content: String, valueType: Class[T]): T = mapper.readValue(content, valueType)
}
and then use it as
val counts = text.map(MapperWrapper.readValue(_, classOf[Map[String, String]]))
This lazy trick works because although an instance of DefaultScalaModule is not serializable, the function to create an instance of DefaultScalaModule is.
Update: what about #transient?
what are the differences here, if I add lazy val vs. #transient lazy val?
This is actually a tricky question. What the lazy val is compiled to is actually something like this:
object MapperWrapper extends java.io.Serializable {
// #transient is set or not set for both fields depending on its presence at "lazy val"
[#transient] private var mapperValue: ObjectMapper = null
[#transient] #volatile private var mapperInitialized = false
def mapper: ObjectMapper = {
if (!mapperInitialized) {
this.synchronized {
val mapper = new ObjectMapper
mapper.registerModule(DefaultScalaModule)
mapperValue = mapper
mapperInitialized = true
}
}
mapperValue
}
def readValue[T](content: String, valueType: Class[T]): T = mapper.readValue(content, valueType)
}
where #transient on the lazy val affects both backing fields. So now you can see why lazy val trick works:
locally it works because it delays initialization of the mapperValue field until first access to the mapper method so the field is safe null when the serialization check is performed
remotely it works because MapperWrapper is fully serializable and the logic of how lazy val should be initialized is put into a method of the same class (see def mapper).
Note however that AFAIK this behavior of how lazy val is compiled is an implementation detail of the current Scala compiler rather than a part of the Scala specification. If at some later point a class similar to .Net Lazy will be added to the Java standard library, Scala compiler potentially might start generating different code. This is important because it provides a kind of trade-off for #transient. The benefit of adding #transient now is that it ensures that code like this works as well:
val someJson:String = "..."
val something:Something = MapperWrapper.readValue(someJson:String, ...)
val counts = text.map(MapperWrapper.readValue(_, classOf[Map[String, String]]))
Without #transient the code above will fail because we forced initialization of the lazy backing field and now it contains a non-serializable value. With #transient this is not an issue as that field will not be serialized at all.
A potential drawback of #transient is that if Scala changes the way code for lazy val is generated and the field is marked as #transient, it might actually be not de-serialized in the remote-work scenario.
Also there is a trick with object because for objects Scala compiler generates custom de-serialization logic (overrides readResolve) to return the same singleton object. It means that the object including the lazy val is not really de-serialized and the value from the object itself is used. It means that #transient lazy val inside object is much more future-proof than inside class in remote scenario.

Create an Instance of class which does DI via Playframework Guice Independently in Scala

I'm working on Playframework2.5 with play-slick and programs related to it such as batch.
current project structure is like
/rootPlayProject
/app
/controllers
/filters
/services
...
Modules
/core (sub-project - DAOs,services are placed here)
/batch (sub-project depends on core)
I'm using Guice DI almost everywhere include Database Access Object(DAO).
And interfaces in core are bound in Module placed in core which end up getting inherited by Module in root project.
Core Module(/rootPlayProject/core/CoreModule.scala)
class CoreModule extends AbstractModule {
override def configure() = {
bind(classOf[FooDAO]).to(classOf[FooDAOImpl])
....
}
}
Root Module(/rootPlayProject/Modules.scala)
class Module extends CoreModule {
override def configure() = {
super.configure()
bind(classOf[FooService]).to(classOf[FooServiceImpl])
}
}
This works pretty well as Playframework application though, I would like to use core module for batch programs and I would like to run the batches without playframework.
so far I tried something like this
object BatchA {
def main(args: Array[String]) = {
val injector = Guice.createInjector(new CoreModule)
val foo = injector.getInstance(classOf[FooDAO])
//do something with foo.
}
}
but since my DAO's requiring things Playframework create such as ExecutionContext,play.api.db.slick.DatabaseConfigProvider and #play.db.NamedDatabase, above code does not run.
My question is, How can I let those things get bound without play application builder?
Thanks in advance.
The answer depends on whether or not you want to actually decouple Play Framework from your DAOs.
Option 1: Don't Decouple
Your main method could simply have the following lines preceeding your val injector line:
val application = new GuiceApplicationBuilder()
.in(Environment(new File("."), this.getClass.getClassLoader, Mode.Prod))
.build
Play.start(application)
Option 2: Decouple
Or, you can provide injectable classes that can provide the ExecutionContext specific to the environment. If you want to inject the DatabaseConfigProvider, you will have to perform further abstractions to remove the direct dependency on Play. The annotation will follow the Play-specific implementation of the abstraction.
In the case of my own project where I encountered this scenario, I opted for Option 1, as the dependency on Play wasn't severe enough of an impact for me.
GuiceInjectorBuilder do the trick.
trait PlayInjector {
lazy val injector = new GuiceInjectorBuilder().configure(Configuration.load(Environment.simple(mode = Mode.Dev)))
.bindings(new BuiltinModule, new CoreModule, new SlickModule).disable(classOf[Application]).injector
def closeInjector = injector.instanceOf[DefaultApplicationLifecycle].stop
}
BuiltinModule binds Play basic modules such as ExecutionContext, ExecutionContextExecutor or ActorSystem.
You can make your own Module to bind only things you need to though, using BuiltinModule and disable classes you don't need is simpler.
how to use it
object Foo extends PlayInjector {
def main(args: Array[String]) = {
val foo = injector.instanceOf[FooDAO]
val bar = injector.instanceOf[BarDAO]
//do something
//when you finish things you want to do
closeInjector
}
}
Since some modules like PlaySlick uses ApplicationLifecycle.addStopHook to handle closing operation. It's safer to execute them instead of just call sys.exit()

Task not serializable: java.io.NotSerializableException when calling function outside closure only on classes not objects

Getting strange behavior when calling function outside of a closure:
when function is in a object everything is working
when function is in a class get :
Task not serializable: java.io.NotSerializableException: testing
The problem is I need my code in a class and not an object. Any idea why this is happening? Is a Scala object serialized (default?)?
This is a working code example:
object working extends App {
val list = List(1,2,3)
val rddList = Spark.ctx.parallelize(list)
//calling function outside closure
val after = rddList.map(someFunc(_))
def someFunc(a:Int) = a+1
after.collect().map(println(_))
}
This is the non-working example :
object NOTworking extends App {
new testing().doIT
}
//adding extends Serializable wont help
class testing {
val list = List(1,2,3)
val rddList = Spark.ctx.parallelize(list)
def doIT = {
//again calling the fucntion someFunc
val after = rddList.map(someFunc(_))
//this will crash (spark lazy)
after.collect().map(println(_))
}
def someFunc(a:Int) = a+1
}
RDDs extend the Serialisable interface, so this is not what's causing your task to fail. Now this doesn't mean that you can serialise an RDD with Spark and avoid NotSerializableException
Spark is a distributed computing engine and its main abstraction is a resilient distributed dataset (RDD), which can be viewed as a distributed collection. Basically, RDD's elements are partitioned across the nodes of the cluster, but Spark abstracts this away from the user, letting the user interact with the RDD (collection) as if it were a local one.
Not to get into too many details, but when you run different transformations on a RDD (map, flatMap, filter and others), your transformation code (closure) is:
serialized on the driver node,
shipped to the appropriate nodes in the cluster,
deserialized,
and finally executed on the nodes
You can of course run this locally (as in your example), but all those phases (apart from shipping over network) still occur. [This lets you catch any bugs even before deploying to production]
What happens in your second case is that you are calling a method, defined in class testing from inside the map function. Spark sees that and since methods cannot be serialized on their own, Spark tries to serialize the whole testing class, so that the code will still work when executed in another JVM. You have two possibilities:
Either you make class testing serializable, so the whole class can be serialized by Spark:
import org.apache.spark.{SparkContext,SparkConf}
object Spark {
val ctx = new SparkContext(new SparkConf().setAppName("test").setMaster("local[*]"))
}
object NOTworking extends App {
new Test().doIT
}
class Test extends java.io.Serializable {
val rddList = Spark.ctx.parallelize(List(1,2,3))
def doIT() = {
val after = rddList.map(someFunc)
after.collect().foreach(println)
}
def someFunc(a: Int) = a + 1
}
or you make someFunc function instead of a method (functions are objects in Scala), so that Spark will be able to serialize it:
import org.apache.spark.{SparkContext,SparkConf}
object Spark {
val ctx = new SparkContext(new SparkConf().setAppName("test").setMaster("local[*]"))
}
object NOTworking extends App {
new Test().doIT
}
class Test {
val rddList = Spark.ctx.parallelize(List(1,2,3))
def doIT() = {
val after = rddList.map(someFunc)
after.collect().foreach(println)
}
val someFunc = (a: Int) => a + 1
}
Similar, but not the same problem with class serialization can be of interest to you and you can read on it in this Spark Summit 2013 presentation.
As a side note, you can rewrite rddList.map(someFunc(_)) to rddList.map(someFunc), they are exactly the same. Usually, the second is preferred as it's less verbose and cleaner to read.
EDIT (2015-03-15): SPARK-5307 introduced SerializationDebugger and Spark 1.3.0 is the first version to use it. It adds serialization path to a NotSerializableException. When a NotSerializableException is encountered, the debugger visits the object graph to find the path towards the object that cannot be serialized, and constructs information to help user to find the object.
In OP's case, this is what gets printed to stdout:
Serialization stack:
- object not serializable (class: testing, value: testing#2dfe2f00)
- field (class: testing$$anonfun$1, name: $outer, type: class testing)
- object (class testing$$anonfun$1, <function1>)
Grega's answer is great in explaining why the original code does not work and two ways to fix the issue. However, this solution is not very flexible; consider the case where your closure includes a method call on a non-Serializable class that you have no control over. You can neither add the Serializable tag to this class nor change the underlying implementation to change the method into a function.
Nilesh presents a great workaround for this, but the solution can be made both more concise and general:
def genMapper[A, B](f: A => B): A => B = {
val locker = com.twitter.chill.MeatLocker(f)
x => locker.get.apply(x)
}
This function-serializer can then be used to automatically wrap closures and method calls:
rdd map genMapper(someFunc)
This technique also has the benefit of not requiring the additional Shark dependencies in order to access KryoSerializationWrapper, since Twitter's Chill is already pulled in by core Spark
Complete talk fully explaining the problem, which proposes a great paradigm shifting way to avoid these serialization problems: https://github.com/samthebest/dump/blob/master/sams-scala-tutorial/serialization-exceptions-and-memory-leaks-no-ws.md
The top voted answer is basically suggesting throwing away an entire language feature - that is no longer using methods and only using functions. Indeed in functional programming methods in classes should be avoided, but turning them into functions isn't solving the design issue here (see above link).
As a quick fix in this particular situation you could just use the #transient annotation to tell it not to try to serialise the offending value (here, Spark.ctx is a custom class not Spark's one following OP's naming):
#transient
val rddList = Spark.ctx.parallelize(list)
You can also restructure code so that rddList lives somewhere else, but that is also nasty.
The Future is Probably Spores
In future Scala will include these things called "spores" that should allow us to fine grain control what does and does not exactly get pulled in by a closure. Furthermore this should turn all mistakes of accidentally pulling in non-serializable types (or any unwanted values) into compile errors rather than now which is horrible runtime exceptions / memory leaks.
http://docs.scala-lang.org/sips/pending/spores.html
A tip on Kryo serialization
When using kyro, make it so that registration is necessary, this will mean you get errors instead of memory leaks:
"Finally, I know that kryo has kryo.setRegistrationOptional(true) but I am having a very difficult time trying to figure out how to use it. When this option is turned on, kryo still seems to throw exceptions if I haven't registered classes."
Strategy for registering classes with kryo
Of course this only gives you type-level control not value-level control.
... more ideas to come.
I faced similar issue, and what I understand from Grega's answer is
object NOTworking extends App {
new testing().doIT
}
//adding extends Serializable wont help
class testing {
val list = List(1,2,3)
val rddList = Spark.ctx.parallelize(list)
def doIT = {
//again calling the fucntion someFunc
val after = rddList.map(someFunc(_))
//this will crash (spark lazy)
after.collect().map(println(_))
}
def someFunc(a:Int) = a+1
}
your doIT method is trying to serialize someFunc(_) method, but as method are not serializable, it tries to serialize class testing which is again not serializable.
So make your code work, you should define someFunc inside doIT method. For example:
def doIT = {
def someFunc(a:Int) = a+1
//function definition
}
val after = rddList.map(someFunc(_))
after.collect().map(println(_))
}
And if there are multiple functions coming into picture, then all those functions should be available to the parent context.
I solved this problem using a different approach. You simply need to serialize the objects before passing through the closure, and de-serialize afterwards. This approach just works, even if your classes aren't Serializable, because it uses Kryo behind the scenes. All you need is some curry. ;)
Here's an example of how I did it:
def genMapper(kryoWrapper: KryoSerializationWrapper[(Foo => Bar)])
(foo: Foo) : Bar = {
kryoWrapper.value.apply(foo)
}
val mapper = genMapper(KryoSerializationWrapper(new Blah(abc))) _
rdd.flatMap(mapper).collectAsMap()
object Blah(abc: ABC) extends (Foo => Bar) {
def apply(foo: Foo) : Bar = { //This is the real function }
}
Feel free to make Blah as complicated as you want, class, companion object, nested classes, references to multiple 3rd party libs.
KryoSerializationWrapper refers to: https://github.com/amplab/shark/blob/master/src/main/scala/shark/execution/serialization/KryoSerializationWrapper.scala
I'm not entirely certain that this applies to Scala but, in Java, I solved the NotSerializableException by refactoring my code so that the closure did not access a non-serializable final field.
Scala methods defined in a class are non-serializable, methods can be converted into functions to resolve serialization issue.
Method syntax
def func_name (x String) : String = {
...
return x
}
function syntax
val func_name = { (x String) =>
...
x
}
FYI in Spark 2.4 a lot of you will probably encounter this issue. Kryo serialization has gotten better but in many cases you cannot use spark.kryo.unsafe=true or the naive kryo serializer.
For a quick fix try changing the following in your Spark configuration
spark.kryo.unsafe="false"
OR
spark.serializer="org.apache.spark.serializer.JavaSerializer"
I modify custom RDD transformations that I encounter or personally write by using explicit broadcast variables and utilizing the new inbuilt twitter-chill api, converting them from rdd.map(row => to rdd.mapPartitions(partition => { functions.
Example
Old (not-great) Way
val sampleMap = Map("index1" -> 1234, "index2" -> 2345)
val outputRDD = rdd.map(row => {
val value = sampleMap.get(row._1)
value
})
Alternative (better) Way
import com.twitter.chill.MeatLocker
val sampleMap = Map("index1" -> 1234, "index2" -> 2345)
val brdSerSampleMap = spark.sparkContext.broadcast(MeatLocker(sampleMap))
rdd.mapPartitions(partition => {
val deSerSampleMap = brdSerSampleMap.value.get
partition.map(row => {
val value = sampleMap.get(row._1)
value
}).toIterator
})
This new way will only call the broadcast variable once per partition which is better. You will still need to use Java Serialization if you do not register classes.
I had a similar experience.
The error was triggered when I initialize a variable on the driver (master), but then tried to use it on one of the workers.
When that happens, Spark Streaming will try to serialize the object to send it over to the worker, and fail if the object is not serializable.
I solved the error by making the variable static.
Previous non-working code
private final PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
Working code
private static final PhoneNumberUtil phoneUtil = PhoneNumberUtil.getInstance();
Credits:
https://learn.microsoft.com/en-us/answers/questions/35812/sparkexception-job-aborted-due-to-stage-failure-ta.html ( The answer of pradeepcheekatla-msft)
https://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/javaionotserializableexception.html
def upper(name: String) : String = {
var uppper : String = name.toUpperCase()
uppper
}
val toUpperName = udf {(EmpName: String) => upper(EmpName)}
val emp_details = """[{"id": "1","name": "James Butt","country": "USA"},
{"id": "2", "name": "Josephine Darakjy","country": "USA"},
{"id": "3", "name": "Art Venere","country": "USA"},
{"id": "4", "name": "Lenna Paprocki","country": "USA"},
{"id": "5", "name": "Donette Foller","country": "USA"},
{"id": "6", "name": "Leota Dilliard","country": "USA"}]"""
val df_emp = spark.read.json(Seq(emp_details).toDS())
val df_name=df_emp.select($"id",$"name")
val df_upperName= df_name.withColumn("name",toUpperName($"name")).filter("id='5'")
display(df_upperName)
this will give error
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
Solution -
import java.io.Serializable;
object obj_upper extends Serializable {
def upper(name: String) : String =
{
var uppper : String = name.toUpperCase()
uppper
}
val toUpperName = udf {(EmpName: String) => upper(EmpName)}
}
val df_upperName=
df_name.withColumn("name",obj_upper.toUpperName($"name")).filter("id='5'")
display(df_upperName)
My solution was to add a compagnion class that handles all methods that are not seriazable within the class.

Scala Cake Pattern and Dependency Collisions

I'm trying to implement dependency injection in Scala with the Cake Pattern, but am running into dependency collisions. Since I could not find a detailed example with such dependencies, here's my problem:
Suppose we have the following trait (with 2 implementations):
trait HttpClient {
def get(url: String)
}
class DefaultHttpClient1 extends HttpClient {
def get(url: String) = ???
}
class DefaultHttpClient2 extends HttpClient {
def get(url: String) = ???
}
And the following two cake pattern modules (which in this example are both APIs that depend on our HttpClient for their functionality):
trait FooApiModule {
def httpClient: HttpClient // dependency
lazy val fooApi = new FooApi() // providing the module's service
class FooApi {
def foo(url: String): String = {
val res = httpClient.get(url)
// ... something foo specific
???
}
}
}
and
trait BarApiModule {
def httpClient: HttpClient // dependency
lazy val barApi = new BarApi() // providing the module's service
class BarApi {
def bar(url: String): String = {
val res = httpClient.get(url)
// ... something bar specific
???
}
}
}
Now when creating the final app that uses both modules, we need to provide the httpClient dependency for both of the modules. But what if we want to provide a different implementation of it for each of the modules? Or simply provide different instances of the dependency configured differently (say with a different ExecutionContext for example)?
object MyApp extends FooApiModule with BarApiModule {
// the same dependency supplied to both modules
val httpClient = new DefaultHttpClient1()
def run() = {
val r1 = fooApi.foo("http://...")
val r2 = barApi.bar("http://...")
// ...
}
}
We could name the dependencies differently in each module, prefixing them with the module name, but that would be cumbersome and inelegant, and also won't work if we don't have full control of the modules ourselves.
Any ideas? Am I misinterpreting the Cake Pattern?
You get the pattern correctly and you've just discovered its important limitation. If two modules depend on some object (say HttpClient) and happen to declare it under the same name (like httpClient), the game is over - you won't configure them separately inside one Cake. Either have two Cakes, like Daniel advises or change modules' sources if you can (as Tomer Gabel is hinting).
Each of those solutions has its problems.
Having two Cakes (Daniel's advice) looks well as long they don't need some common dependencies.
Renaming some dependencies (provided it's possible) forces you to adjust all code that uses those.
Therefore some people (including me) prefer solutions immune to those problems, like using plain old constructors and avoid Cake altogether. If you measured it, they don't add much bloat to the code (Cake is already pretty verbose) and they're much more flexible.
"You're doing it wrong" (TM). You'd have the exact same problem with Spring, Guice or any IoC container: you're treating types as names (or symbols); you're saying "Give me an HTTP client" instead of "Give me an HTTP client suitable for communicating with fooApi".
In other words, you have multiple HTTP clients all named httpClient, which does not allow you to make any distinction between different instances. It's kind of like taking an #Autowired HttpClient without some way to qualify the reference (in Spring's case, usually by bean ID with external wiring).
In the cake pattern, one way to resolve this is to qualify that distinction with a different name: FooApiModule requires e.g. a def http10HttpClient: HttpClient and BarApiModule requires def connectionPooledHttpClient: HttpClient. When "filling in" the different modules, the different names both reference two different instances but are also indicative of the constraints the two modules place on their dependencies.
An alternative (workable albeit not as clean in my opinion) is to simply require a module-specific named dependency, i.e. def fooHttpClient: HttpClient, which simply forces an explicit external wiring on whomever mixes your module in.
Instead of extending FooApiModule and BarApiModule in a single place -- which would mean they share dependencies -- make them both separate objects, each with their dependencies solved accordingly.
Seems to be the known "robot legs" problem. You need to construct two legs of a robot, however you need to supply two different feet to them.
How to use the cake pattern to have both common dependencies and separate?
Let's have L1 <- A, B1; L2 <- A, B2. And you want to have Main <- L1, L2, A.
To have separate dependencies we need two instances of smaller cakes, parameterized with common dependencies.
trait LegCommon { def a:A}
trait Bdep { def b:B }
class L(val common:LegCommon) extends Bdep {
import common._
// declarations of Leg. Have both A and B.
}
trait B1module extends Bdep {
val b = new B1
}
trait B2module extends Bdep {
def b = new B2
}
In Main we'll have common part in cake and two legs:
trait Main extends LegCommon {
val l1 = new L(this) with B1module
val l2 = new L(this) with B2module
val a = new A
}
Your final app should look like this:
object MyApp {
val fooApi = new FooApiModule {
val httpClient = new DefaultHttpClient1()
}.fooApi
val barApi = new BarApiModule {
val httpClient = new DefaultHttpClient2()
}.barApi
...
def run() = {
val r1 = fooApi.foo("http://...")
val r2 = barApi.bar("http://...")
// ...
}
}
That should work. (Adapted from this blog post: http://www.cakesolutions.net/teamblogs/2011/12/19/cake-pattern-in-depth/)

Scala trait mixins for an object/ typesafe Config

This is an OO design Q. Im using typesafe Config in my App. The Config interface is very useful, however there are a couple of fields in my applications; config file that are mandatory. What I wanted to do was create a subInterface of Config and add these 2 top-level methods . Something like this
trait AppConfig extends Config{
def param1:String
def param2:String
}
However creating a real instance of AppConfig given an instance of Config doesnt seem feasible.( I dont want to create wrapper object and duplicate all the methods on the Config interface) . Ideally , Im looking for something that would achieve something close to this
val conf:Config = //get config object from somewhere
return conf with AppConfig { overrider def param1 = {"blah"} }
I do understand the last line is not valid . But Im looking for a pattern/construct with an equivalent functionality.
We've been using Configrity for things like this. Here's an example:
We keep our compiled defaults that we use for unit test/etc in objects
object DefaultConfigs {
val defaultConfig = Configuration(
"param1" -> "blah"
)
val otherConfig = Configuration(
"param2" -> "blah"
)
val appConfig = defaultConfig.include(otherConfig)
}
And then at run time we can include them or not
val appConfig = Configuration.load(pathToConfig) include DefaultConfigs.appConfig
What you are looking for is basically what some call a "dynamic mixin". This is not supported out of the box by scala.
Someone developed a compiler plugin to support this: http://www.artima.com/weblogs/viewpost.jsp?thread=275588
However it's a bit old and the project does not seem active anymore.
A better alternative would be to implement this feature using scala macros (requires scala 2.10, which has no stable release yet).
All of this is probably overkill in your case though. Until some well tested library is available, I would advise to just create the proxy by hand (however dull that may look):
trait ConfigProxy extends Config {
def impl: Config
// Forward to the inner config
def root: ConfigObject = impl.root
def origin: ConfigOrigin = impl.origin
//... and so on
}
val conf:Config = //get config object from somewhere
val myConf: AppConfig = new AppConfig with ConfigProxy {
val impl = conf
val param1:String = "foo"
val param2:String = "bar"
}
How about using a combination of Dynamic and Reflection. Dynamic to handle your convenience methods and reflection to handle the methods on config.
Here's a stab:
class ConfigDynamic(val config: Config) extends Dynamic {
def selectDynamic(name: String)= {
name match {
case "field1" =>
config.getString("field1")
case x # _ =>
// overly simplified here
val meth = configClass.getDeclaredMethod(x)
meth.invoke(config)
}
}
}