Scala MapN with conditions - scala

Hi I have the following Scala code with cats library
results = patrons.map(p => {
(verifyCardId(p.cardId), verifyAddress(p.address)).map2(
(maybeValidCard, maybeValidAddress) => {
val result = for {
idCheck <- maybeValidCard
addressCheck <- maybeValidAddress
} yield CheckResult(p.name, idCheck, addressCheck)
}
})
where verifyCardId and verifyAddress is an external API call returning a Future which are somehow very expensive and time consuming.
The question is how do I do the following:
If one of the patron does not have a card, the code should be able to skip checking the card must still check patron's address
If the patron has both then, the code should check the card and the address
How can I improve the existing code? Thanks heaps
Edit:
Add more information about the preference to skip one of the expensive API calls

If result of address verification doesn't depend on result of card verification then untie them (flatMap binds monads).
Model CheckResult should be adjusted to case of missing card.
extended sample
case class Patron(name: String, cardId: String, address: String)
case class CheckResult(name: String, idCheck: Option[Boolean], addressCheck: Boolean)
def verifyCardId(cardId: String) = Future{
Thread.sleep(5000)
Some(true)
}
def verifyAddress(address: String) = Future{
Thread.sleep(5000)
Some(true)
}
val patrons = List(Patron("p_name", "1234", "Somewhere St. 42"))
val start = LocalDateTime.now()
val results = patrons.map(p => {
(verifyCardId(p.cardId), verifyAddress(p.address)).map2(
(maybeValidCard, maybeValidAddress) => {
for {
addressCheck <- maybeValidAddress
} yield CheckResult(p.name, maybeValidCard, addressCheck)
})
})
val headResult = Await.result(results.head, Duration.Inf)
val end = LocalDateTime.now()
val duration = ChronoUnit.SECONDS.between(start, end)
short output
patrons: List[Patron] = List(Patron(p_name,1234,Somewhere St. 42))
headResult: Option[CheckResult] = Some(CheckResult(p_name,Some(true),true))
duration: Long = 5

Related

Scala Action Map Implementation Issue (follow up)

This is a fairly long winded question and a follow up to my last one.
I have the following code for an application being built - I am looking to call the function in handleOne but it is not working in the action map. I think this is due to the unit assigned to statesVotes in the handler. The goal is to create a menu driven application that performs a set of desired functions. The function in question here is: Get all the state values and display suitably formatted.
Potentially have to make the states into a map but looking for the same functionality of the case class.
import scala.io.StdIn.readInt
object myApp3 extends App{
val dataRE = "([^(]+) \\((\\d+)\\),(.+)".r
val pVotes = "([^:]+):(\\d+)".r
case class State(name : String
,code : Int
,parties : Array[(String,Int)])
val states: List[State] =
util.Using(io.Source.fromFile("filename.txt"))(_.getLines().toList)
.get //will throw if read file fails
.collect{case dataRE(name,code,votes) =>
State(name.trim
,code.toInt
,votes.split(",")
.collect{case pVotes(p,v) => (p,v.toInt)}
)
}
val actionMap = Map[Int, () => Boolean](1 -> handleOne)
var opt = 0
do{
opt = readOption
} while (menu(opt))
def readOption: Int = {
println(
"""|Please select one of the following:
| 1 - Show All States and Votes
| 2 - CW Option 2
| 3 - quit""".stripMargin)
readInt()
}
def menu(option: Int): Boolean = {
actionMap.get(option) match {
case Some(f) => f()
case None =>
println("Command not recognized!")
true
}
}
// handle one calls function mnuShowStatesVotes, which invokes function statesVotes
def handleOne(): Boolean = {
mnuShowStatesVotes(statesVotes : List[State])
true
}
def mnuShowStatesVotes(f:() => List[State]) = {
f() foreach(println())
}
def statesVotes = states.sortBy(_.name) //alphabetical order of states
.foreach{ st =>
println(st.name) //show line by split by state name
st.parties
.sortBy(-_._2) //sorts parties by votes in descending order
.map{case (p,v) => f"\t$p%-12s:$v%9d"}
.foreach(println)
}
}
Essentially want the menu option handleOne to correctly invoke the function in statesVotes.
The text file being used can be found below:
Alabama (9),Democratic:849624,Republican:1441170,Libertarian:25176,Others:7312
Alaska (3),Democratic:153778,Republican:189951,Libertarian:8897,Others:6904
Arizona (11),Democratic:1672143,Republican:1661686,Libertarian:51465,Green:1557,Others:475
It seems to me that your code would benefit by adopting a clear and distinct separation/segregation of roles and responsibilities.
Let's get the preliminaries taken care of.
import scala.util.{Try, Success, Failure, Using}
case class State(name : String
,code : Int
,parties : Array[(String,Int)])
Now let's parse the input data.
This code has one job to do: load the data from the input file. It takes one parameter, the input filename, and returns either Success() with the accumulated data, or Failure() with the error exception.
def readFile(filename: String): Try[List[State]] = {
val dataRE = "([^(]+) \\((\\d+)\\),(.+)".r
val pVotes = "([^:]+):(\\d+)".r
Using(io.Source.fromFile(filename)) {
_.getLines()
.toList
.collect{ case dataRE(name, code, votes) =>
State(name.trim
,code.toInt
,votes.split(",")
.collect{case pVotes(p,v) => (p,v.toInt)})
}
}
}
Note that collect() will simply ignore file data the doesn't fit the expected format. If you were to use map() instead then bad input data would cause a Failure().
Now let's put all the output methods, and their descriptions, under one roof. This is most of what the user will see.
class Menu(states: List[State]) {
def apply(key: String): Boolean = {
val (_, op, continue) = lookup(key)
op()
continue
}
private val lookup: Map[String,(String,()=>Unit,Boolean)] =
Map("?" -> ("show this menu", menu _, true)
,"menu" -> ("show this menu", menu _, true)
,"all" -> ("display all voting data", all _, true)
,"st" -> ("vote totals by state", stVotes _, true)
,"x" -> ("exit", done _, false)
,"quit" -> ("exit", done _, false)
).withDefaultValue(("",unknown _, true))
private def done(): Unit = println("bye")
private def unknown(): Unit =
println("unknown selection ('?' for main menu)")
private def menu(): Unit =
lookup.keys.toVector.sorted
.map(k => s"$k\t: ${lookup(k)._1}")
.foreach(println)
private def all(): Unit =
states.sortBy(_.name) //alphabetical
.foreach{ st =>
println(st.name) //state name
st.parties
.sortBy(-_._2) //votes in decreasing order
.map{case (p,v) => f"\t$p%-12s:$v%9d"}
.foreach(println)
}
private def stVotes(): Unit =
states.map(st => (st.name, st.parties.map(_._2).sum))
.sortBy(-_._2) //votes in decreasing order
.map{case (state,total) => f"$state%-9s:$total%8d"}
.foreach(println)
}
Notice that only the apply() method is public. Everything else is private and under wraps.
To create a new data report you just add an entry in the lookup Map and add the new method to produce the output.
Now all we need is the code to tie the pieces together and to take user input.
def main(args: Array[String]): Unit =
args.headOption.map(readFile) match {
case None =>
println(s"usage: ${this.productPrefix} <data_file>")
case Some(Failure(exc)) =>
println(s"Error reading data file: $exc")
case Some(Success(stateData)) =>
val menu = new Menu(stateData)
menu("menu")
Iterator.continually(menu(io.StdIn.readLine(">> ").toLowerCase))
.dropWhile(identity)
.next()
}
Note that this.productPrefix is made available if the surrounding object is a case object.

Chain Scala Futures when processing a Seq of objects?

import scala.concurrent.duration.Duration
import scala.concurrent.duration.Duration._
import scala.concurrent.{Await, Future}
import scala.concurrent.Future._
import scala.concurrent.ExecutionContext.Implicits.global
object TestClass {
final case class Record(id: String)
final case class RecordDetail(id: String)
final case class UploadResult(result: String)
val ids: Seq[String] = Seq("a", "b", "c", "d")
def fetch(id: String): Future[Option[Record]] = Future {
Thread sleep 100
if (id != "b" && id != "d") {
Some(Record(id))
} else None
}
def fetchRecordDetail(record: Record): Future[RecordDetail] = Future {
Thread sleep 100
RecordDetail(record.id + "_detail")
}
def upload(recordDetail: RecordDetail): Future[UploadResult] = Future {
Thread sleep 100
UploadResult(recordDetail.id + "_uploaded")
}
def notifyUploaded(results: Seq[UploadResult]): Unit = println("notified " + results)
def main(args: Array[String]): Unit = {
//for each id from ids, call fetch method and if record exists call fetchRecordDetail
//and after that upload RecordDetail, collect all UploadResults into seq
//and call notifyUploaded with that seq and await result, you should see "notified ...." in console
// In the following line of code how do I pass result of fetch to fetchRecordDetail function
val result = Future.traverse(ids)(x => Future(fetch(x)))
// val result: Future[Unit] = ???
Await.ready(result, Duration.Inf)
}
}
My problem is that I don't know what code to put in the main to make it work as written in the comments. To sum up, I have an ids:Seq[String] and I want each id to go through asynchronous methods fetch, fetchRecordDetail, upload, and finally the whole Seq to come to notifyUploaded.
I think that the simplest way to do it is :
def main(args: Array[String]): Unit = {
//for each id from ids, call fetch method and if record exists call fetchRecordDetail
//and after that upload RecordDetail, collect all UploadResults into seq
//and call notifyUploaded with that seq and await result, you should see "notified ...." in console
def runWithOption[A, B](f: A => Future[B], oa: Option[A]): Future[Option[B]] = oa match {
case Some(a) => f(a).map(b => Some(b))
case None => Future.successful(None)
}
val ids: Seq[String] = Seq("a", "b", "c", "d")
val resultSeq: Seq[Future[Option[UploadResult]]] = ids.map(id => {
for (or: Option[Record] <- fetch(id);
ord: Option[RecordDetail] <- runWithOption(fetchRecordDetail, or);
our: Option[UploadResult] <- runWithOption(upload, ord)
) yield our
})
val filteredResult: Future[Seq[UploadResult]] = Future.sequence(resultSeq).map(s => s.collect({ case Some(ur) => ur }))
val result: Future[Seq[UploadResult]] = filteredResult.andThen({ case Success(s) => notifyUploaded(s) })
Await.ready(result, Duration.Inf)
}
The idea is that you first get a Seq[Future[_]] that you map through all the methods (here it is done using for-comprehension). Here is an important trick is to actually pass Seq[Future[Option[_]]]. Passing Option[_] through the whole chain via runWithOption helper method simplifies code a lot without a need to block until the very last stage.
Then you convert Seq[Future[_]] into a Future[Seq[_]] and filter out results for those ids that failed at the fetch stage. And finally you apply notifyUploaded.
P.S. Note that there is no error handling in this code whatsoever and it is not clear how you expect it to behave in case of errors at different stages.

Iterate data source asynchronously in batch and stop while remote return no data in Scala

Let's say we have a fake data source which will return data it holds in batch
class DataSource(size: Int) {
private var s = 0
implicit val g = scala.concurrent.ExecutionContext.global
def getData(): Future[List[Int]] = {
s = s + 1
Future {
Thread.sleep(Random.nextInt(s * 100))
if (s <= size) {
List.fill(100)(s)
} else {
List()
}
}
}
object Test extends App {
val source = new DataSource(100)
implicit val g = scala.concurrent.ExecutionContext.global
def process(v: List[Int]): Unit = {
println(v)
}
def next(f: (List[Int]) => Unit): Unit = {
val fut = source.getData()
fut.onComplete {
case Success(v) => {
f(v)
v match {
case h :: t => next(f)
}
}
}
}
next(process)
Thread.sleep(1000000000)
}
I have mine, the problem here is some portion is more not pure. Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list? My situation is a little from this post, the next() there is synchronous call while my is also async.
Or is it ever possible to do what I want? Next batch will only be fetched when the previous one is resolved in the end whether to fetch the next batch depends on the size returned?
What's the best way to walk through this type of data sources? Are there any existing Scala frameworks that provide the feature I am looking for? Is play's Iteratee, Enumerator, Enumeratee the right tool? If so, can anyone provide an example on how to use those facilities to implement what I am looking for?
Edit----
With help from chunjef, I had just tried out. And it actually did work out for me. However, there was some small change I made based on his answer.
Source.fromIterator(()=>Iterator.continually(source.getData())).mapAsync(1) (f=>f.filter(_.size > 0))
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
However, can someone give comparison between Akka Stream and Play Iteratee? Does it worth me also try out Iteratee?
Code snip 1:
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
Code snip 2: Assuming the getData depends on some other output of another flow, and I would like to concat it with the below flow. However, it yield too many files open error. Not sure what would cause this error, the mapAsync has been limited to 1 as its throughput if I understood correctly.
Flow[Int].mapConcat[Future[List[Int]]](c => {
Iterator.continually(ds.getData(c)).to[collection.immutable.Iterable]
}).mapAsync(1)(identity).takeWhile(_.nonEmpty).runForeach(println)
The following is one way to achieve the same behavior with Akka Streams, using your DataSource class:
import scala.concurrent.Future
import scala.util.Random
import akka.actor.ActorSystem
import akka.stream._
import akka.stream.scaladsl._
object StreamsExample extends App {
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
val ds = new DataSource(100)
Source.fromIterator(() => Iterator.continually(ds.getData)) // line 1
.mapAsync(1)(identity) // line 2
.takeWhile(_.nonEmpty) // line 3
.runForeach(println) // line 4
}
class DataSource(size: Int) {
...
}
A simplified line-by-line overview:
line 1: Creates a stream source that continually calls ds.getData if there is downstream demand.
line 2: mapAsync is a way to deal with stream elements that are Futures. In this case, the stream elements are of type Future[List[Int]]. The argument 1 is the level of parallelism: we specify 1 here because DataSource internally uses a mutable variable, and a parallelism level greater than one could produce unexpected results. identity is shorthand for x => x, which basically means that for each Future, we pass its result downstream without transforming it.
line 3: Essentially, ds.getData is called as long as the result of the Future is a non-empty List[Int]. If an empty List is encountered, processing is terminated.
line 4: runForeach here takes a function List[Int] => Unit and invokes that function for each stream element.
Ideally, I would like to wrap the Future for each batch into a big future, and the wrapper future success when last batch returned 0 size list?
I think you are looking for a Promise.
You would set up a Promise before you start the first iteration.
This gives you promise.future, a Future that you can then use to follow the completion of everything.
In your onComplete, you add a case _ => promise.success().
Something like
def loopUntilDone(f: (List[Int]) => Unit): Future[Unit] = {
val promise = Promise[Unit]
def next(): Unit = source.getData().onComplete {
case Success(v) =>
f(v)
v match {
case h :: t => next()
case _ => promise.success()
}
case Failure(e) => promise.failure(e)
}
// get going
next(f)
// return the Future for everything
promise.future
}
// future for everything, this is a `Future[Unit]`
// its `onComplete` will be triggered when there is no more data
val everything = loopUntilDone(process)
You are probably looking for a reactive streams library. My personal favorite (and one I'm most familiar with) is Monix. This is how it will work with DataSource unchanged
import scala.concurrent.duration.Duration
import scala.concurrent.Await
import monix.reactive.Observable
import monix.execution.Scheduler.Implicits.global
object Test extends App {
val source = new DataSource(100)
val completed = // <- this is Future[Unit], completes when foreach is done
Observable.repeat(Observable.fromFuture(source.getData()))
.flatten // <- Here it's Observable[List[Int]], it has collection-like methods
.takeWhile(_.nonEmpty)
.foreach(println)
Await.result(completed, Duration.Inf)
}
I just figured out that by using flatMapConcat can achieve what I wanted to achieve. There is no point to start another question as I have had the answer already. Put my sample code here just in case someone is looking for similar answer.
This type of API is very common for some integration between traditional Enterprise applications. The DataSource is to mock the API while the object App is to demonstrate how the client code can utilize Akka Stream to consume the APIs.
In my small project the API was provided in SOAP, and I used scalaxb to transform the SOAP to Scala async style. And with the client calls demonstrated in the object App, we can consume the API with AKKA Stream. Thanks for all for the help.
class DataSource(size: Int) {
private var transactionId: Long = 0
private val transactionCursorMap: mutable.HashMap[TransactionId, Set[ReadCursorId]] = mutable.HashMap.empty
private val cursorIteratorMap: mutable.HashMap[ReadCursorId, Iterator[List[Int]]] = mutable.HashMap.empty
implicit val g = scala.concurrent.ExecutionContext.global
case class TransactionId(id: Long)
case class ReadCursorId(id: Long)
def startTransaction(): Future[TransactionId] = {
Future {
synchronized {
transactionId += transactionId
}
val t = TransactionId(transactionId)
transactionCursorMap.update(t, Set(ReadCursorId(0)))
t
}
}
def createCursorId(t: TransactionId): ReadCursorId = {
synchronized {
val c = transactionCursorMap.getOrElseUpdate(t, Set(ReadCursorId(0)))
val currentId = c.foldLeft(0l) { (acc, a) => acc.max(a.id) }
val cId = ReadCursorId(currentId + 1)
transactionCursorMap.update(t, c + cId)
cursorIteratorMap.put(cId, createIterator)
cId
}
}
def createIterator(): Iterator[List[Int]] = {
(for {i <- 1 to 100} yield List.fill(100)(i)).toIterator
}
def startRead(t: TransactionId): Future[ReadCursorId] = {
Future {
createCursorId(t)
}
}
def getData(cursorId: ReadCursorId): Future[List[Int]] = {
synchronized {
Future {
Thread.sleep(Random.nextInt(100))
cursorIteratorMap.get(cursorId) match {
case Some(i) => i.next()
case _ => List()
}
}
}
}
}
object Test extends App {
val source = new DataSource(10)
implicit val system = ActorSystem("Sandbox")
implicit val materializer = ActorMaterializer()
implicit val g = scala.concurrent.ExecutionContext.global
//
// def process(v: List[Int]): Unit = {
// println(v)
// }
//
// def next(f: (List[Int]) => Unit): Unit = {
// val fut = source.getData()
// fut.onComplete {
// case Success(v) => {
// f(v)
// v match {
//
// case h :: t => next(f)
//
// }
// }
//
// }
//
// }
//
// next(process)
//
// Thread.sleep(1000000000)
val s = Source.fromFuture(source.startTransaction())
.map { e =>
source.startRead(e)
}
.mapAsync(1)(identity)
.flatMapConcat(
e => {
Source.fromIterator(() => Iterator.continually(source.getData(e)))
})
.mapAsync(5)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runForeach(println)
/*
val done = Source.fromIterator(() => Iterator.continually(source.getData())).mapAsync(1)(identity)
.via(Flow[List[Int]].takeWhile(_.nonEmpty))
.runFold(List[List[Int]]()) { (acc, r) =>
// println("=======" + acc + r)
r :: acc
}
done.onSuccess {
case e => {
e.foreach(println)
}
}
done.onComplete(_ => system.terminate())
*/
}

Scala: Cool way to manage sequential execution of Futures?

I'm trying to write a data module in Scala.
While loading entire data in parallel, some data depends on other data, so execution sequence has to be managed in efficient way.
For example in code, I keep a map with name of data and manifest
val dataManifestMap = Map(
"foo" -> manifest[String],
"bar" -> manifest[Int],
"baz" -> manifest[Int],
"foobar" -> manifest[Set[String]], // need to be executed after "foo" and "bar" is ready
"foobarbaz" -> manifest[String], // need to be executed after "foobar" and "baz" is ready
)
These data will be stored in a mutable hash map
private var dataStorage = new mutable.HashMap[String, Future[Any]]()
There are some code that will load data
def loadAllData(): Future[Unit] = {
Future.join(
(dataManifestMap map {
case (data, m) => loadData(data, m) } // function has all the string matching and loading stuff
).toSeq
)
}
def loadData[T](data: String, m: Manifest[T]): Future[Unit] = {
val d = data match {
case "foo" => Future.value("foo")
case "bar" => Future.value(3)
case "foobar" => // do something with dataStorage("foo") and dataStorage("bar")
... // and so forth (in a real example it would be much more complicated for sure)
}
d flatMap {
dVal => { this.synchronized { dataStorage(data) = dVal }; Future.value(Unit) }
}
}
This way, I cannot make sure "foobar" is loaded when "foo" and "bar" is ready, and so forth.
How can I manage this in a "cool" way, since I might have hundreds of different data?
It would be "awesome" if I could have some kind of data structure that has all the info about something has to be loaded after something, and sequential execution can be handled by flatMap in a neat way.
Thanks for the help in advance.
All things being equal, I'd tend to use for comprehensions. For example:
def findBucket: Future[Bucket[Empty]] = ???
def fillBucket(bucket: Bucket[Empty]): Future[Bucket[Water]] = ???
def extinguishOvenFire(waterBucket: Bucket[Water]): Future[Oven] = ???
def makeBread(oven: Oven): Future[Bread] = ???
def makeSoup(oven: Oven): Future[Soup] = ???
def eatSoup(soup: Soup, bread: Bread): Unit = ???
def doLunch = {
for (bucket <- findBucket;
filledBucket <- fillBucket(bucket);
oven <- extinguishOvenFire(filledBucket);
soupFuture = makeSoup(oven);
breadFuture = makeBread(oven);
soup <- soupFuture;
bread <- breadFuture) {
eatSoup(soup, bread)
}
}
This chains futures together, and calls the relevant methods once dependencies are satisfied. Note that we use = in the for comprehension to allow us to start two Futures at the same time. As it stands, doLunch returns Unit, but if you replace the last few lines with:
// ..snip..
bread <- breadFuture) yield {
eatSoup(soup, bread)
oven
}
}
Then it will return Future[Oven] - which might be useful if you want to use the oven for something else after lunch.
As for your code, my first though would be that you should consider Spray cache, as it looks like it might fit your requirements. If not, my next thought would be to replace the Stringly typed interface you've currently got and go with something based on typed method calls:
private def save[T](key: String)(value: Future[T]) = this.synchronized {
dataStorage(key) = value
value
}
def loadFoo = save("foo"){Future("foo")}
def loadBar = save("bar"){Future(3)}
def loadFooBar = save("foobar"){
for (foo <- loadFoo;
bar <- loadBar) yield foo + bar // Or whatever
}
def loadBaz = save("baz"){Future(200L)}
def loadAll = {
val topLevelFutures = Seq(loadFooBar, loadBaz)
// Use standard library function to combine futures
Future.fold(topLevelFutures)(())((u,f) => ())
}
// I don't consider this method necessary, but if you've got a legacy API to support...
def loadData[T](key: String)(implicit manifest: Manifest[T]) = {
val future = key match {
case "foo" => loadFoo
case "bar" => loadBar
case "foobar" => loadFooBar
case "baz" => loadBaz
case "all" => loadAll
}
future.mapTo[T]
}

How do I rewrite a for loop with a shared dependency using actors

We have some code which needs to run faster. Its already profiled so we would like to make use of multiple threads. Usually I would setup an in memory queue, and have a number of threads taking jobs of the queue and calculating the results. For the shared data I would use a ConcurrentHashMap or similar.
I don't really want to go down that route again. From what I have read using actors will result in cleaner code and if I use akka migrating to more than 1 jvm should be easier. Is that true?
However, I don't know how to think in actors so I am not sure where to start.
To give a better idea of the problem here is some sample code:
case class Trade(price:Double, volume:Int, stock:String) {
def value(priceCalculator:PriceCalculator) =
(priceCalculator.priceFor(stock)-> price)*volume
}
class PriceCalculator {
def priceFor(stock:String) = {
Thread.sleep(20)//a slow operation which can be cached
50.0
}
}
object ValueTrades {
def valueAll(trades:List[Trade],
priceCalculator:PriceCalculator):List[(Trade,Double)] = {
trades.map { trade => (trade,trade.value(priceCalculator)) }
}
def main(args:Array[String]) {
val trades = List(
Trade(30.5, 10, "Foo"),
Trade(30.5, 20, "Foo")
//usually much longer
)
val priceCalculator = new PriceCalculator
val values = valueAll(trades, priceCalculator)
}
}
I'd appreciate it if someone with experience using actors could suggest how this would map on to actors.
This is a complement to my comment on shared results for expensive calculations. Here it is:
import scala.actors._
import Actor._
import Futures._
case class PriceFor(stock: String) // Ask for result
// The following could be an "object" as well, if it's supposed to be singleton
class PriceCalculator extends Actor {
val map = new scala.collection.mutable.HashMap[String, Future[Double]]()
def act = loop {
react {
case PriceFor(stock) => reply(map getOrElseUpdate (stock, future {
Thread.sleep(2000) // a slow operation
50.0
}))
}
}
}
Here's an usage example:
scala> val pc = new PriceCalculator; pc.start
pc: PriceCalculator = PriceCalculator#141fe06
scala> class Test(stock: String) extends Actor {
| def act = {
| println(System.currentTimeMillis().toString+": Asking for stock "+stock)
| val f = (pc !? PriceFor(stock)).asInstanceOf[Future[Double]]
| println(System.currentTimeMillis().toString+": Got the future back")
| val res = f.apply() // this blocks until the result is ready
| println(System.currentTimeMillis().toString+": Value: "+res)
| }
| }
defined class Test
scala> List("abc", "def", "abc").map(new Test(_)).map(_.start)
1269310737461: Asking for stock abc
res37: List[scala.actors.Actor] = List(Test#6d888e, Test#1203c7f, Test#163d118)
1269310737461: Asking for stock abc
1269310737461: Asking for stock def
1269310737464: Got the future back
scala> 1269310737462: Got the future back
1269310737465: Got the future back
1269310739462: Value: 50.0
1269310739462: Value: 50.0
1269310739465: Value: 50.0
scala> new Test("abc").start // Should return instantly
1269310755364: Asking for stock abc
res38: scala.actors.Actor = Test#15b5b68
1269310755365: Got the future back
scala> 1269310755367: Value: 50.0
For simple parallelization, where I throw a bunch of work out to process and then wait for it all to come back, I tend to like to use a Futures pattern.
class ActorExample {
import actors._
import Actor._
class Worker(val id: Int) extends Actor {
def busywork(i0: Int, i1: Int) = {
var sum,i = i0
while (i < i1) {
i += 1
sum += 42*i
}
sum
}
def act() { loop { react {
case (i0:Int,i1:Int) => sender ! busywork(i0,i1)
case None => exit()
}}}
}
val workforce = (1 to 4).map(i => new Worker(i)).toList
def parallelFourSums = {
workforce.foreach(_.start())
val futures = workforce.map(w => w !! ((w.id,1000000000)) );
val computed = futures.map(f => f() match {
case i:Int => i
case _ => throw new IllegalArgumentException("I wanted an int!")
})
workforce.foreach(_ ! None)
computed
}
def serialFourSums = {
val solo = workforce.head
workforce.map(w => solo.busywork(w.id,1000000000))
}
def timed(f: => List[Int]) = {
val t0 = System.nanoTime
val result = f
val t1 = System.nanoTime
(result, t1-t0)
}
def go {
val serial = timed( serialFourSums )
val parallel = timed( parallelFourSums )
println("Serial result: " + serial._1)
println("Parallel result:" + parallel._1)
printf("Serial took %.3f seconds\n",serial._2*1e-9)
printf("Parallel took %.3f seconds\n",parallel._2*1e-9)
}
}
Basically, the idea is to create a collection of workers--one per workload--and then throw all the data at them with !! which immediately gives back a future. When you try to read the future, the sender blocks until the worker's actually done with the data.
You could rewrite the above so that PriceCalculator extended Actor instead, and valueAll coordinated the return of the data.
Note that you have to be careful passing non-immutable data around.
Anyway, on the machine I'm typing this from, if you run the above you get:
scala> (new ActorExample).go
Serial result: List(-1629056553, -1629056636, -1629056761, -1629056928)
Parallel result:List(-1629056553, -1629056636, -1629056761, -1629056928)
Serial took 1.532 seconds
Parallel took 0.443 seconds
(Obviously I have at least four cores; the parallel timing varies rather a bit depending on which worker gets what processor and what else is going on on the machine.)