Passing a list of functions to a method to be executed later with data supplied - scala

I am trying to build a multi level Validator object that is fairly generic. The idea being you have levels of validations if Level 1 passes then you do Level 2, etc. but I am struggling with one specific area: creating a function call but not executing it until a later point.
Data:
case class FooData(alpha: String, beta: String) extends AllData
case class BarData(gamma: Int, delta: Int) extends AllData
ValidationError:
case class ValidationError(code: String, message: String)
Validator:
object Validator {
def validate(validations: List[List[Validation]]): List[ValidationError] = {
validations match {
case head :: nil => // Execute the functions and get the results back
// Recursively work down the levels (below syntax may be incorrect)
case head :: tail => validate(head) ... // if no errors then validate(tail) etc.
...
}
}
}
Sample Validator:
object CorrectNameFormatValidator extends Validation {
def validate(str: String): Seq[ValidationError] = {
...
}
}
How I wish to use it:
object App {
def main(args: Array[String]): Unit = {
val fooData = FooData(alpha = "first", beta = "second")
val levelOneValidations = List(
CorrectNameFormatValidator(fooData.alpha),
CorrectNameFormatValidator(fooData.beta),
SomeOtherValidator(fooData.beta)
)
// I don't want these to execute as function calls here
val levelTwoValidations = List(
SomeLevelTwoValidator (fooData.alpha),
SomeLevelTwoValidator(fooData.beta),
SomeOtherLevelValidator(fooData.beta),
SomeOtherLevelValidator(fooData.alpha)
)
val validationLevels = List(levelOneValidations, levelTwoValidations)
Validator.validate(validationLevels)
}
}
Am I doing something really convoluted when I don't need to be or am I just missing a component?
Essentially I want to define when a function will be called and with which parameters but I don't want the call to happen until I say within the Validator. Is this something that's possible?

You can use lazy val or def when defining levelOneValidation, levelTwoValidations and validationLevel:
object App {
def main(args: Array[String]): Unit = {
val fooData = FooData(alpha = "first", beta = "second")
lazy val levelOneValidations = List(
CorrectNameFormatValidator(fooData.alpha),
CorrectNameFormatValidator(fooData.beta),
SomeOtherValidator(fooData.beta)
)
// I don't want these to execute as function calls here
lazy val levelTwoValidations = List(
SomeLevelTwoValidator (fooData.alpha),
SomeLevelTwoValidator(fooData.beta),
SomeOtherLevelValidator(fooData.beta),
SomeOtherLevelValidator(fooData.alpha)
)
lazy val validationLevels = List(levelOneValidations, levelTwoValidations)
Validator.validate(validationLevels)
}
}
You also need to change validate method to get the validations ByName and not
ByValue using : => :
object Validator {
def validate(validations: => List[List[Validation]]): List[ValidationError] = {
validations match {
case head :: nil => // Execute the functions and get the results back
// Recursively work down the levels (below syntax may be incorrect)
case head :: tail => validate(head) ... // if no errors then validate(tail) etc.
...
}
}
}
Anyway, I think you can implement it differently by just using some OOP design patterns, like Chain of Responsibility.

Related

Scala Action Map Implementation Issue (follow up)

This is a fairly long winded question and a follow up to my last one.
I have the following code for an application being built - I am looking to call the function in handleOne but it is not working in the action map. I think this is due to the unit assigned to statesVotes in the handler. The goal is to create a menu driven application that performs a set of desired functions. The function in question here is: Get all the state values and display suitably formatted.
Potentially have to make the states into a map but looking for the same functionality of the case class.
import scala.io.StdIn.readInt
object myApp3 extends App{
val dataRE = "([^(]+) \\((\\d+)\\),(.+)".r
val pVotes = "([^:]+):(\\d+)".r
case class State(name : String
,code : Int
,parties : Array[(String,Int)])
val states: List[State] =
util.Using(io.Source.fromFile("filename.txt"))(_.getLines().toList)
.get //will throw if read file fails
.collect{case dataRE(name,code,votes) =>
State(name.trim
,code.toInt
,votes.split(",")
.collect{case pVotes(p,v) => (p,v.toInt)}
)
}
val actionMap = Map[Int, () => Boolean](1 -> handleOne)
var opt = 0
do{
opt = readOption
} while (menu(opt))
def readOption: Int = {
println(
"""|Please select one of the following:
| 1 - Show All States and Votes
| 2 - CW Option 2
| 3 - quit""".stripMargin)
readInt()
}
def menu(option: Int): Boolean = {
actionMap.get(option) match {
case Some(f) => f()
case None =>
println("Command not recognized!")
true
}
}
// handle one calls function mnuShowStatesVotes, which invokes function statesVotes
def handleOne(): Boolean = {
mnuShowStatesVotes(statesVotes : List[State])
true
}
def mnuShowStatesVotes(f:() => List[State]) = {
f() foreach(println())
}
def statesVotes = states.sortBy(_.name) //alphabetical order of states
.foreach{ st =>
println(st.name) //show line by split by state name
st.parties
.sortBy(-_._2) //sorts parties by votes in descending order
.map{case (p,v) => f"\t$p%-12s:$v%9d"}
.foreach(println)
}
}
Essentially want the menu option handleOne to correctly invoke the function in statesVotes.
The text file being used can be found below:
Alabama (9),Democratic:849624,Republican:1441170,Libertarian:25176,Others:7312
Alaska (3),Democratic:153778,Republican:189951,Libertarian:8897,Others:6904
Arizona (11),Democratic:1672143,Republican:1661686,Libertarian:51465,Green:1557,Others:475
It seems to me that your code would benefit by adopting a clear and distinct separation/segregation of roles and responsibilities.
Let's get the preliminaries taken care of.
import scala.util.{Try, Success, Failure, Using}
case class State(name : String
,code : Int
,parties : Array[(String,Int)])
Now let's parse the input data.
This code has one job to do: load the data from the input file. It takes one parameter, the input filename, and returns either Success() with the accumulated data, or Failure() with the error exception.
def readFile(filename: String): Try[List[State]] = {
val dataRE = "([^(]+) \\((\\d+)\\),(.+)".r
val pVotes = "([^:]+):(\\d+)".r
Using(io.Source.fromFile(filename)) {
_.getLines()
.toList
.collect{ case dataRE(name, code, votes) =>
State(name.trim
,code.toInt
,votes.split(",")
.collect{case pVotes(p,v) => (p,v.toInt)})
}
}
}
Note that collect() will simply ignore file data the doesn't fit the expected format. If you were to use map() instead then bad input data would cause a Failure().
Now let's put all the output methods, and their descriptions, under one roof. This is most of what the user will see.
class Menu(states: List[State]) {
def apply(key: String): Boolean = {
val (_, op, continue) = lookup(key)
op()
continue
}
private val lookup: Map[String,(String,()=>Unit,Boolean)] =
Map("?" -> ("show this menu", menu _, true)
,"menu" -> ("show this menu", menu _, true)
,"all" -> ("display all voting data", all _, true)
,"st" -> ("vote totals by state", stVotes _, true)
,"x" -> ("exit", done _, false)
,"quit" -> ("exit", done _, false)
).withDefaultValue(("",unknown _, true))
private def done(): Unit = println("bye")
private def unknown(): Unit =
println("unknown selection ('?' for main menu)")
private def menu(): Unit =
lookup.keys.toVector.sorted
.map(k => s"$k\t: ${lookup(k)._1}")
.foreach(println)
private def all(): Unit =
states.sortBy(_.name) //alphabetical
.foreach{ st =>
println(st.name) //state name
st.parties
.sortBy(-_._2) //votes in decreasing order
.map{case (p,v) => f"\t$p%-12s:$v%9d"}
.foreach(println)
}
private def stVotes(): Unit =
states.map(st => (st.name, st.parties.map(_._2).sum))
.sortBy(-_._2) //votes in decreasing order
.map{case (state,total) => f"$state%-9s:$total%8d"}
.foreach(println)
}
Notice that only the apply() method is public. Everything else is private and under wraps.
To create a new data report you just add an entry in the lookup Map and add the new method to produce the output.
Now all we need is the code to tie the pieces together and to take user input.
def main(args: Array[String]): Unit =
args.headOption.map(readFile) match {
case None =>
println(s"usage: ${this.productPrefix} <data_file>")
case Some(Failure(exc)) =>
println(s"Error reading data file: $exc")
case Some(Success(stateData)) =>
val menu = new Menu(stateData)
menu("menu")
Iterator.continually(menu(io.StdIn.readLine(">> ").toLowerCase))
.dropWhile(identity)
.next()
}
Note that this.productPrefix is made available if the surrounding object is a case object.

Access Spark broadcast variable in different classes

I am broadcasting a value in Spark Streaming application . But I am not sure how to access that variable in a different class than the class where it was broadcasted.
My code looks as follows:
object AppMain{
def main(args: Array[String]){
//...
val broadcastA = sc.broadcast(a)
//..
lines.foreachRDD(rdd => {
val obj = AppObject1
rdd.filter(p => obj.apply(p))
rdd.count
}
}
object AppObject1: Boolean{
def apply(str: String){
AnotherObject.process(str)
}
}
object AnotherObject{
// I want to use broadcast variable in this object
val B = broadcastA.Value // compilation error here
def process(): Boolean{
//need to use B inside this method
}
}
Can anyone suggest how to access broadcast variable in this case?
There is nothing particularly Spark specific here ignoring possible serialization issues. If you want to use some object it has to be available in the current scope and you can achieve this the same way as usual:
you can define your helpers in a scope where broadcast is already defined:
{
...
val x = sc.broadcast(1)
object Foo {
def foo = x.value
}
...
}
you can use it as a constructor argument:
case class Foo(x: org.apache.spark.broadcast.Broadcast[Int]) {
def foo = x.value
}
...
Foo(sc.broadcast(1)).foo
method argument
case class Foo() {
def foo(x: org.apache.spark.broadcast.Broadcast[Int]) = x.value
}
...
Foo().foo(sc.broadcast(1))
or even mixed-in your helpers like this:
trait Foo {
val x: org.apache.spark.broadcast.Broadcast[Int]
def foo = x.value
}
object Main extends Foo {
val sc = new SparkContext("local", "test", new SparkConf())
val x = sc.broadcast(1)
def main(args: Array[String]) {
sc.parallelize(Seq(None)).map(_ => foo).first
sc.stop
}
}
Just a short take on performance considerations that were introduced earlier.
Options proposed by zero233 are indeed very elegant way of doing this kind of things in Scala. At the same time it is important to understand implications of using certain patters in distributed system.
It is not the best idea to use mixin approach / any logic that uses enclosing class state. Whenever you use a state of enclosing class within lambdas Spark will have to serialize outer object. This is not always true but you'd better off writing safer code than one day accidentally blow up the whole cluster.
Being aware of this, I would personally go for explicit argument passing to the methods as this would not result in outer class serialization (method argument approach).
you can use classes and pass the broadcast variable to classes
your psudo code should look like :
object AppMain{
def main(args: Array[String]){
//...
val broadcastA = sc.broadcast(a)
//..
lines.foreach(rdd => {
val obj = new AppObject1(broadcastA)
rdd.filter(p => obj.apply(p))
rdd.count
})
}
}
class AppObject1(bc : Broadcast[String]){
val anotherObject = new AnotherObject(bc)
def apply(str: String): Boolean ={
anotherObject.process(str)
}
}
class AnotherObject(bc : Broadcast[String]){
// I want to use broadcast variable in this object
def process(str : String): Boolean = {
val a = bc.value
true
//need to use B inside this method
}
}

cache using functional callbacks/ proxy pattern implementation scala

How to implement cache using functional programming
A few days ago I came across callbacks and proxy pattern implementation using scala.
This code should only apply inner function if the value is not in the map.
But every time map is reinitialized and values are gone (which seems obivous.
How to use same cache again and again between different function calls
class Aggregator{
def memoize(function: Function[Int, Int] ):Function[Int,Int] = {
val cache = HashMap[Int, Int]()
(t:Int) => {
if (!cache.contains(t)) {
println("Evaluating..."+t)
val r = function.apply(t);
cache.put(t,r)
r
}
else
{
cache.get(t).get;
}
}
}
def memoizedDoubler = memoize( (key:Int) => {
println("Evaluating...")
key*2
})
}
object Aggregator {
def main( args: Array[String] ) {
val agg = new Aggregator()
agg.memoizedDoubler(2)
agg.memoizedDoubler(2)// It should not evaluate again but does
agg.memoizedDoubler(3)
agg.memoizedDoubler(3)// It should not evaluate again but does
}
I see what you're trying to do here, the reason it's not working is that every time you call memoizedDoubler it's first calling memorize. You need to declare memoizedDoubler as a val instead of def if you want it to only call memoize once.
val memoizedDoubler = memoize( (key:Int) => {
println("Evaluating...")
key*2
})
This answer has a good explanation on the difference between def and val. https://stackoverflow.com/a/12856386/37309
Aren't you declaring a new Map per invocation ?
def memoize(function: Function[Int, Int] ):Function[Int,Int] = {
val cache = HashMap[Int, Int]()
rather than specifying one per instance of Aggregator ?
e.g.
class Aggregator{
private val cache = HashMap[Int, Int]()
def memoize(function: Function[Int, Int] ):Function[Int,Int] = {
To answer your question:
How to implement cache using functional programming
In functional programming there is no concept of mutable state. If you want to change something (like cache), you need to return updated cache instance along with the result and use it for the next call.
Here is modification of your code that follows that approach. function to calculate values and cache is incorporated into Aggregator. When memoize is called, it returns tuple, that contains calculation result (possibly taken from cache) and new Aggregator that should be used for the next call.
class Aggregator(function: Function[Int, Int], cache:Map[Int, Int] = Map.empty) {
def memoize:Int => (Int, Aggregator) = {
t:Int =>
cache.get(t).map {
res =>
(res, Aggregator.this)
}.getOrElse {
val res = function(t)
(res, new Aggregator(function, cache + (t -> res)))
}
}
}
object Aggregator {
def memoizedDoubler = new Aggregator((key:Int) => {
println("Evaluating..." + key)
key*2
})
def main(args: Array[String]) {
val (res, doubler1) = memoizedDoubler.memoize(2)
val (res1, doubler2) = doubler1.memoize(2)
val (res2, doubler3) = doubler2.memoize(3)
val (res3, doubler4) = doubler3.memoize(3)
}
}
This prints:
Evaluating...2
Evaluating...3

How do you write a json4s CustomSerializer that handles collections

I have a class that I am trying to deserialize using the json4s CustomSerializer functionality. I need to do this due to the inability of json4s to deserialize mutable collections.
This is the basic structure of the class I want to deserialize (don't worry about why the class is structured like this):
case class FeatureValue(timestamp:Double)
object FeatureValue{
implicit def ordering[F <: FeatureValue] = new Ordering[F] {
override def compare(a: F, b: F): Int = {
a.timestamp.compareTo(b.timestamp)
}
}
}
class Point {
val features = new HashMap[String, SortedSet[FeatureValue]]
def add(name:String, value:FeatureValue):Unit = {
val oldValue:SortedSet[FeatureValue] = features.getOrElseUpdate(name, SortedSet[FeatureValue]())
oldValue += value
}
}
Json4s serializes this just fine. A serialized instance might look like the following:
{"features":
{
"CODE0":[{"timestamp":4.8828914447482E8}],
"CODE1":[{"timestamp":4.8828914541333E8}],
"CODE2":[{"timestamp":4.8828915127325E8},{"timestamp":4.8828910097466E8}]
}
}
I've tried writing a custom deserializer, but I don't know how to deal with the list tails. In a normal matcher you can just call your own function recursively, but in this case the function is anonymous and being called through the json4s API. I cannot find any examples that deal with this and I can't figure it out.
Currently I can match only a single hash key, and a single FeatureValue instance in its value. Here is the CustomSerializer as it stands:
import org.json4s.{FieldSerializer, DefaultFormats, Extraction, CustomSerializer}
import org.json4s.JsonAST._
class PointSerializer extends CustomSerializer[Point](format => (
{
case JObject(JField("features", JObject(Nil)) :: Nil) => new Point
case JObject(List(("features", JObject(List(
(feature:String, JArray(List(JObject(List(("timestamp",JDouble(ts)))))))))
))) => {
val point = new Point
point.add(feature, FeatureValue(ts))
point
}
},
{
// don't need to customize this, it works fine
case x: Point => Extraction.decompose(x)(DefaultFormats + FieldSerializer[Point]())
}
))
If I try to change to using the :: separated list format, so far I have gotten compiler errors. Even if I didn't get compiler errors, I am not sure what I would do with them.
You can get the list of json features in your pattern match and then map over this list to get the Features and their codes.
class PointSerializer extends CustomSerializer[Point](format => (
{
case JObject(List(("features", JObject(featuresJson)))) =>
val features = featuresJson.flatMap {
case (code:String, JArray(timestamps)) =>
timestamps.map { case JObject(List(("timestamp",JDouble(ts)))) =>
code -> FeatureValue(ts)
}
}
val point = new Point
features.foreach((point.add _).tupled)
point
}, {
case x: Point => Extraction.decompose(x)(DefaultFormats + FieldSerializer[Point]())
}
))
Which deserializes your json as follows :
import org.json4s.native.Serialization.{read, write}
implicit val formats = Serialization.formats(NoTypeHints) + new PointSerializer
val json = """
{"features":
{
"CODE0":[{"timestamp":4.8828914447482E8}],
"CODE1":[{"timestamp":4.8828914541333E8}],
"CODE2":[{"timestamp":4.8828915127325E8},{"timestamp":4.8828910097466E8}]
}
}
"""
val point0 = read[Point]("""{"features": {}}""")
val point1 = read[Point](json)
point0.features // Map()
point1.features
// Map(
// CODE0 -> TreeSet(FeatureValue(4.8828914447482E8)),
// CODE2 -> TreeSet(FeatureValue(4.8828910097466E8), FeatureValue(4.8828915127325E8)),
// CODE1 -> TreeSet(FeatureValue(4.8828914541333E8))
// )

Scala: Cool way to manage sequential execution of Futures?

I'm trying to write a data module in Scala.
While loading entire data in parallel, some data depends on other data, so execution sequence has to be managed in efficient way.
For example in code, I keep a map with name of data and manifest
val dataManifestMap = Map(
"foo" -> manifest[String],
"bar" -> manifest[Int],
"baz" -> manifest[Int],
"foobar" -> manifest[Set[String]], // need to be executed after "foo" and "bar" is ready
"foobarbaz" -> manifest[String], // need to be executed after "foobar" and "baz" is ready
)
These data will be stored in a mutable hash map
private var dataStorage = new mutable.HashMap[String, Future[Any]]()
There are some code that will load data
def loadAllData(): Future[Unit] = {
Future.join(
(dataManifestMap map {
case (data, m) => loadData(data, m) } // function has all the string matching and loading stuff
).toSeq
)
}
def loadData[T](data: String, m: Manifest[T]): Future[Unit] = {
val d = data match {
case "foo" => Future.value("foo")
case "bar" => Future.value(3)
case "foobar" => // do something with dataStorage("foo") and dataStorage("bar")
... // and so forth (in a real example it would be much more complicated for sure)
}
d flatMap {
dVal => { this.synchronized { dataStorage(data) = dVal }; Future.value(Unit) }
}
}
This way, I cannot make sure "foobar" is loaded when "foo" and "bar" is ready, and so forth.
How can I manage this in a "cool" way, since I might have hundreds of different data?
It would be "awesome" if I could have some kind of data structure that has all the info about something has to be loaded after something, and sequential execution can be handled by flatMap in a neat way.
Thanks for the help in advance.
All things being equal, I'd tend to use for comprehensions. For example:
def findBucket: Future[Bucket[Empty]] = ???
def fillBucket(bucket: Bucket[Empty]): Future[Bucket[Water]] = ???
def extinguishOvenFire(waterBucket: Bucket[Water]): Future[Oven] = ???
def makeBread(oven: Oven): Future[Bread] = ???
def makeSoup(oven: Oven): Future[Soup] = ???
def eatSoup(soup: Soup, bread: Bread): Unit = ???
def doLunch = {
for (bucket <- findBucket;
filledBucket <- fillBucket(bucket);
oven <- extinguishOvenFire(filledBucket);
soupFuture = makeSoup(oven);
breadFuture = makeBread(oven);
soup <- soupFuture;
bread <- breadFuture) {
eatSoup(soup, bread)
}
}
This chains futures together, and calls the relevant methods once dependencies are satisfied. Note that we use = in the for comprehension to allow us to start two Futures at the same time. As it stands, doLunch returns Unit, but if you replace the last few lines with:
// ..snip..
bread <- breadFuture) yield {
eatSoup(soup, bread)
oven
}
}
Then it will return Future[Oven] - which might be useful if you want to use the oven for something else after lunch.
As for your code, my first though would be that you should consider Spray cache, as it looks like it might fit your requirements. If not, my next thought would be to replace the Stringly typed interface you've currently got and go with something based on typed method calls:
private def save[T](key: String)(value: Future[T]) = this.synchronized {
dataStorage(key) = value
value
}
def loadFoo = save("foo"){Future("foo")}
def loadBar = save("bar"){Future(3)}
def loadFooBar = save("foobar"){
for (foo <- loadFoo;
bar <- loadBar) yield foo + bar // Or whatever
}
def loadBaz = save("baz"){Future(200L)}
def loadAll = {
val topLevelFutures = Seq(loadFooBar, loadBaz)
// Use standard library function to combine futures
Future.fold(topLevelFutures)(())((u,f) => ())
}
// I don't consider this method necessary, but if you've got a legacy API to support...
def loadData[T](key: String)(implicit manifest: Manifest[T]) = {
val future = key match {
case "foo" => loadFoo
case "bar" => loadBar
case "foobar" => loadFooBar
case "baz" => loadBaz
case "all" => loadAll
}
future.mapTo[T]
}