I'm trying to write a data module in Scala.
While loading entire data in parallel, some data depends on other data, so execution sequence has to be managed in efficient way.
For example in code, I keep a map with name of data and manifest
val dataManifestMap = Map(
"foo" -> manifest[String],
"bar" -> manifest[Int],
"baz" -> manifest[Int],
"foobar" -> manifest[Set[String]], // need to be executed after "foo" and "bar" is ready
"foobarbaz" -> manifest[String], // need to be executed after "foobar" and "baz" is ready
)
These data will be stored in a mutable hash map
private var dataStorage = new mutable.HashMap[String, Future[Any]]()
There are some code that will load data
def loadAllData(): Future[Unit] = {
Future.join(
(dataManifestMap map {
case (data, m) => loadData(data, m) } // function has all the string matching and loading stuff
).toSeq
)
}
def loadData[T](data: String, m: Manifest[T]): Future[Unit] = {
val d = data match {
case "foo" => Future.value("foo")
case "bar" => Future.value(3)
case "foobar" => // do something with dataStorage("foo") and dataStorage("bar")
... // and so forth (in a real example it would be much more complicated for sure)
}
d flatMap {
dVal => { this.synchronized { dataStorage(data) = dVal }; Future.value(Unit) }
}
}
This way, I cannot make sure "foobar" is loaded when "foo" and "bar" is ready, and so forth.
How can I manage this in a "cool" way, since I might have hundreds of different data?
It would be "awesome" if I could have some kind of data structure that has all the info about something has to be loaded after something, and sequential execution can be handled by flatMap in a neat way.
Thanks for the help in advance.
All things being equal, I'd tend to use for comprehensions. For example:
def findBucket: Future[Bucket[Empty]] = ???
def fillBucket(bucket: Bucket[Empty]): Future[Bucket[Water]] = ???
def extinguishOvenFire(waterBucket: Bucket[Water]): Future[Oven] = ???
def makeBread(oven: Oven): Future[Bread] = ???
def makeSoup(oven: Oven): Future[Soup] = ???
def eatSoup(soup: Soup, bread: Bread): Unit = ???
def doLunch = {
for (bucket <- findBucket;
filledBucket <- fillBucket(bucket);
oven <- extinguishOvenFire(filledBucket);
soupFuture = makeSoup(oven);
breadFuture = makeBread(oven);
soup <- soupFuture;
bread <- breadFuture) {
eatSoup(soup, bread)
}
}
This chains futures together, and calls the relevant methods once dependencies are satisfied. Note that we use = in the for comprehension to allow us to start two Futures at the same time. As it stands, doLunch returns Unit, but if you replace the last few lines with:
// ..snip..
bread <- breadFuture) yield {
eatSoup(soup, bread)
oven
}
}
Then it will return Future[Oven] - which might be useful if you want to use the oven for something else after lunch.
As for your code, my first though would be that you should consider Spray cache, as it looks like it might fit your requirements. If not, my next thought would be to replace the Stringly typed interface you've currently got and go with something based on typed method calls:
private def save[T](key: String)(value: Future[T]) = this.synchronized {
dataStorage(key) = value
value
}
def loadFoo = save("foo"){Future("foo")}
def loadBar = save("bar"){Future(3)}
def loadFooBar = save("foobar"){
for (foo <- loadFoo;
bar <- loadBar) yield foo + bar // Or whatever
}
def loadBaz = save("baz"){Future(200L)}
def loadAll = {
val topLevelFutures = Seq(loadFooBar, loadBaz)
// Use standard library function to combine futures
Future.fold(topLevelFutures)(())((u,f) => ())
}
// I don't consider this method necessary, but if you've got a legacy API to support...
def loadData[T](key: String)(implicit manifest: Manifest[T]) = {
val future = key match {
case "foo" => loadFoo
case "bar" => loadBar
case "foobar" => loadFooBar
case "baz" => loadBaz
case "all" => loadAll
}
future.mapTo[T]
}
Related
I wrote simple callback(handler) function which i pass to async api and i want to wait for result:
object Handlers {
val logger: Logger = Logger("Handlers")
implicit val cs: ContextShift[IO] =
IO.contextShift(ExecutionContext.Implicits.global)
class DefaultHandler[A] {
val response: IO[MVar[IO, A]] = MVar.empty[IO, A]
def onResult(obj: Any): Unit = {
obj match {
case obj: A =>
println(response.flatMap(_.tryPut(obj)).unsafeRunSync())
println(response.flatMap(_.isEmpty).unsafeRunSync())
case _ => logger.error("Wrong expected type")
}
}
def getResponse: A = {
response.flatMap(_.take).unsafeRunSync()
}
}
But for some reason both tryPut and isEmpty(when i'd manually call onResult method) returns true, therefore when i calling getResponse it sleeps forever.
This is the my test:
class HandlersTest extends FunSuite {
test("DefaultHandler.test") {
val handler = new DefaultHandler[Int]
handler.onResult(3)
val response = handler.getResponse
assert(response != 0)
}
}
Can somebody explain why tryPut returns true, but nothing puts. And what is the right way to use Mvar/channels in scala?
IO[X] means that you have the recipe to create some X. So on your example, yuo are putting in one MVar and then asking in another.
Here is how I would do it.
object Handlers {
trait DefaultHandler[A] {
def onResult(obj: Any): IO[Unit]
def getResponse: IO[A]
}
object DefaultHandler {
def apply[A : ClassTag]: IO[DefaultHandler[A]] =
MVar.empty[IO, A].map { response =>
new DefaultHandler[A] {
override def onResult(obj: Any): IO[Unit] = obj match {
case obj: A =>
for {
r1 <- response.tryPut(obj)
_ <- IO(println(r1))
r2 <- response.isEmpty
_ <- IO(println(r2))
} yield ()
case _ =>
IO(logger.error("Wrong expected type"))
}
override def getResponse: IO[A] =
response.take
}
}
}
}
The "unsafe" is sort of a hint, but every time you call unsafeRunSync, you should basically think of it as an entire new universe. Before you make the call, you can only describe instructions for what will happen, you can't actually change anything. During the call is when all the changes occur. Once the call completes, that universe is destroyed, and you can read the result but no longer change anything. What happens in one unsafeRunSync universe doesn't affect another.
You need to call it exactly once in your test code. That means your test code needs to look something like:
val test = for {
handler <- TestHandler.DefaultHandler[Int]
_ <- handler.onResult(3)
response <- handler.getResponse
} yield response
assert test.unsafeRunSync() == 3
Note this doesn't really buy you much over just using the MVar directly. I think you're trying to mix side effects inside IO and outside it, but that doesn't work. All the side effects need to be inside.
I am trying to build a multi level Validator object that is fairly generic. The idea being you have levels of validations if Level 1 passes then you do Level 2, etc. but I am struggling with one specific area: creating a function call but not executing it until a later point.
Data:
case class FooData(alpha: String, beta: String) extends AllData
case class BarData(gamma: Int, delta: Int) extends AllData
ValidationError:
case class ValidationError(code: String, message: String)
Validator:
object Validator {
def validate(validations: List[List[Validation]]): List[ValidationError] = {
validations match {
case head :: nil => // Execute the functions and get the results back
// Recursively work down the levels (below syntax may be incorrect)
case head :: tail => validate(head) ... // if no errors then validate(tail) etc.
...
}
}
}
Sample Validator:
object CorrectNameFormatValidator extends Validation {
def validate(str: String): Seq[ValidationError] = {
...
}
}
How I wish to use it:
object App {
def main(args: Array[String]): Unit = {
val fooData = FooData(alpha = "first", beta = "second")
val levelOneValidations = List(
CorrectNameFormatValidator(fooData.alpha),
CorrectNameFormatValidator(fooData.beta),
SomeOtherValidator(fooData.beta)
)
// I don't want these to execute as function calls here
val levelTwoValidations = List(
SomeLevelTwoValidator (fooData.alpha),
SomeLevelTwoValidator(fooData.beta),
SomeOtherLevelValidator(fooData.beta),
SomeOtherLevelValidator(fooData.alpha)
)
val validationLevels = List(levelOneValidations, levelTwoValidations)
Validator.validate(validationLevels)
}
}
Am I doing something really convoluted when I don't need to be or am I just missing a component?
Essentially I want to define when a function will be called and with which parameters but I don't want the call to happen until I say within the Validator. Is this something that's possible?
You can use lazy val or def when defining levelOneValidation, levelTwoValidations and validationLevel:
object App {
def main(args: Array[String]): Unit = {
val fooData = FooData(alpha = "first", beta = "second")
lazy val levelOneValidations = List(
CorrectNameFormatValidator(fooData.alpha),
CorrectNameFormatValidator(fooData.beta),
SomeOtherValidator(fooData.beta)
)
// I don't want these to execute as function calls here
lazy val levelTwoValidations = List(
SomeLevelTwoValidator (fooData.alpha),
SomeLevelTwoValidator(fooData.beta),
SomeOtherLevelValidator(fooData.beta),
SomeOtherLevelValidator(fooData.alpha)
)
lazy val validationLevels = List(levelOneValidations, levelTwoValidations)
Validator.validate(validationLevels)
}
}
You also need to change validate method to get the validations ByName and not
ByValue using : => :
object Validator {
def validate(validations: => List[List[Validation]]): List[ValidationError] = {
validations match {
case head :: nil => // Execute the functions and get the results back
// Recursively work down the levels (below syntax may be incorrect)
case head :: tail => validate(head) ... // if no errors then validate(tail) etc.
...
}
}
}
Anyway, I think you can implement it differently by just using some OOP design patterns, like Chain of Responsibility.
I have a Future lazy val that obtains some object and a function which submits operations in the Future.
class C {
def printLn(s: String) = println(s)
}
lazy val futureC: Future[C] = Future{Thread.sleep(3000); new C()}
def func(s: String): Unit = {
futureC.foreach{c => c.printLn(s)}
}
The problem is when Future is completed it executes operations in reverse order than they have been submited. So for example if I execute sequentialy
func("A")
func("B")
func("C")
I get after Future completion
scala> C
B
A
This order is important for me. Is there a way to preserve this order?
Of course I can use an actor who asks for future and stashing strings while future is not ready, but it seems redundant for me.
lazy val futureC: Future[C]
lazy vals in scala will be compiled in to the code which uses a synchronized block for thread safety.
Here when the func(A) is called, it will obtain the lock for the lazy val and that thread will go to sleep.
Therefore func(B) & func(C) will blocked by the lock.
When those blocked threads are run, the order cannot be guaranteed.
If you do it like below, you'll have the order as you expect. This is because the for comprehension creates a flatMap, & map based chain that gets executed sequentially.
lazy val futureC: Future[C] = Future {
Thread.sleep(1000)
new C()
}
def func(s: String) : Future[Unit] = {
futureC.map { c => c.printLn(s) }
}
val x = for {
_ <- func("A")
_ <- func("B")
_ <- func("C")
} yield ()
The order preserves even without the lazy keyword. You can remove the lazy keyword unless it is really necessary.
Hope this helps.
You can use Future.traverse to ensure the order of execution.
Something like this.. Im not sure how your func has a reference to the correct futureC, so I moved it inside.
def func(s: String): Future[Unit] = {
lazy val futureC = Future{Thread.sleep(3000); new C()}
futureC.map{c => c.printLn(s)}
}
def traverse[A,B](xs: Seq[A])(fn: A => Future[B]): Future[Seq[B]] =
xs.foldLeft(Future(Seq[B]())) { (acc, item) =>
acc.flatMap { accValue =>
fn(item).map { itemValue =>
accValue :+ itemValue
}
}
}
traverse(Seq("A","B","C"))(func)
How to implement cache using functional programming
A few days ago I came across callbacks and proxy pattern implementation using scala.
This code should only apply inner function if the value is not in the map.
But every time map is reinitialized and values are gone (which seems obivous.
How to use same cache again and again between different function calls
class Aggregator{
def memoize(function: Function[Int, Int] ):Function[Int,Int] = {
val cache = HashMap[Int, Int]()
(t:Int) => {
if (!cache.contains(t)) {
println("Evaluating..."+t)
val r = function.apply(t);
cache.put(t,r)
r
}
else
{
cache.get(t).get;
}
}
}
def memoizedDoubler = memoize( (key:Int) => {
println("Evaluating...")
key*2
})
}
object Aggregator {
def main( args: Array[String] ) {
val agg = new Aggregator()
agg.memoizedDoubler(2)
agg.memoizedDoubler(2)// It should not evaluate again but does
agg.memoizedDoubler(3)
agg.memoizedDoubler(3)// It should not evaluate again but does
}
I see what you're trying to do here, the reason it's not working is that every time you call memoizedDoubler it's first calling memorize. You need to declare memoizedDoubler as a val instead of def if you want it to only call memoize once.
val memoizedDoubler = memoize( (key:Int) => {
println("Evaluating...")
key*2
})
This answer has a good explanation on the difference between def and val. https://stackoverflow.com/a/12856386/37309
Aren't you declaring a new Map per invocation ?
def memoize(function: Function[Int, Int] ):Function[Int,Int] = {
val cache = HashMap[Int, Int]()
rather than specifying one per instance of Aggregator ?
e.g.
class Aggregator{
private val cache = HashMap[Int, Int]()
def memoize(function: Function[Int, Int] ):Function[Int,Int] = {
To answer your question:
How to implement cache using functional programming
In functional programming there is no concept of mutable state. If you want to change something (like cache), you need to return updated cache instance along with the result and use it for the next call.
Here is modification of your code that follows that approach. function to calculate values and cache is incorporated into Aggregator. When memoize is called, it returns tuple, that contains calculation result (possibly taken from cache) and new Aggregator that should be used for the next call.
class Aggregator(function: Function[Int, Int], cache:Map[Int, Int] = Map.empty) {
def memoize:Int => (Int, Aggregator) = {
t:Int =>
cache.get(t).map {
res =>
(res, Aggregator.this)
}.getOrElse {
val res = function(t)
(res, new Aggregator(function, cache + (t -> res)))
}
}
}
object Aggregator {
def memoizedDoubler = new Aggregator((key:Int) => {
println("Evaluating..." + key)
key*2
})
def main(args: Array[String]) {
val (res, doubler1) = memoizedDoubler.memoize(2)
val (res1, doubler2) = doubler1.memoize(2)
val (res2, doubler3) = doubler2.memoize(3)
val (res3, doubler4) = doubler3.memoize(3)
}
}
This prints:
Evaluating...2
Evaluating...3
On the first line below, I want to know if there's anything I can or should put to the left of the = that will future-proof against bad editing. I had one Action.async, but now I want to have two choices (run a single MongoDB query or run multiple MongoDB queries) to arrive at the answer to the web query q. My code compiles as-is, but I think I should have to put something before the = to make the code more type-safe.
def q(arg: String) =
if (wantMultipleQueriesByDataSource)
runMultipleQueriesByDataSource(arg)
else
runSingleQuery(arg)
def runSingleQuery(arg: String) = Action.async {
if (oDb.isDefined) {
val collection: MongoCollection[Document] = oDb.get.getCollection(collectionName)
val fut = getMapData(collection, arg)
fut.map { docs: Seq[Document] =>
val docsSources = List(DocsSource(docs, "*"))
val pickedDocs = pickDocs(args, docsSources)
Ok(buildQueryAnswer(pickedDocs))
} recover {
case e => BadRequest("FAIL(rect): " + e.getMessage)
}
}
else Future.successful(Ok(buildQueryAnswer(Nil)))
}
def runMultipleQueriesByDataSource(arg: String) = Action.async {
if (oDb.isDefined) {
val collection: MongoCollection[Document] = oDb.get.getCollection(collectionName)
val dataSources = List("apples", "bananas", "cherries")
val futureCollections = dataSources map { getMapDataByDataSource(collection, _, arg) }
Future.sequence(futureCollections) map {
case docsGroupedByDataSource: Seq[Seq[Document]] =>
val docsSources = (docsGroupedByDataSource zip dataSources) map {x => DocsSource(x._1, x._2)}
val pickedDocs = pickDocs(args, docsSources)
Ok(buildQueryAnswer(pickedDocs))
case _ => Ok(buildQueryAnswer(Nil))
} recover {
case e => BadRequest("FAIL(rect): " + e.getMessage)
}
}
else Future.successful(Ok(buildQueryAnswer(Nil)))
}
You don't need to because Scala has type inference in this case. But if you want, you can change the method signature to the following:
def q(arg: String): Action[AnyContent]
I'm assuming here that you are using Playframework.
Edit: In fact, in this case, Action.async return an Action[AnyContent] and it receives a block that returns a Future[Result].