Correct usage of mutable/immutable lists - scala

At the moment, Im trying to understand Functional Programming in Scala and I came across a problem I cannot figure out myself.
Imagine the following situation:
You have two classes: Controller and Bot. A Bot is an independent Actor which is initiated by a Controller, does some expensive operation and returns the result to the Controller. The purpose of the Controller is therefore easy to describe: Instantiate multiple objects of Bot, start them and receive the result.
So far, so good; I can implement all this without using any mutable objects.
But what do I do, if I have to store the result that a Bot returns, to use it later as input for another Bot (and later on means that I don't know when at compile time!)?
Doing this with a mutable list or collection is fairly easy, but I add a lot of problems to my code (as we are dealing with concurrency here).
Is it possible, following the FP paradigm, to solve this by using immutable objects (lists...) safely?
BTW, im new to FP, so this question might sound stupid, but I cannot figure out how to solve this :)

Actors usually have internal state, being, themselves, mutable beasts. Note that actors are not a FP thing.
The setup you describe seems to rely on a mutable controller, and it is difficult to get around it in a language that is not non-strict by default. Depending on what you are doing, though, you could rely on futures. For example:
case Msg(info) =>
val v1 = new Bot !! Fn1(info)
val v2 = new Bot !! Fn2(info)
val v3 = new Bot !! Fn3(info)
val v4 = new Bot !! Fn4(v1(), v2(), v3())
reply(v4())
In this case -- because !! returns a Future -- v1, v2 and v3 will be computed in parallel. The message Fn4 is receiving as parameters the futures applied, meaning it will wait until all values are computed before it starts computing.
Likewise, the reply will only be sent after v4 has been computed, as the future has been applied for as well.
A really functional way of doing these things is the functional reactive programming, or FRP for short. It is a different model than actors.
The beauty of Scala, though, is that you can combine such paradigms to the extent that better fits your problem.

This is how an Erlang-like actor could look in Scala:
case class Actor[State](val s: State)(body: State => Option[State]) { // immutable
#tailrec
def loop(s1: State) {
body(s1) match {
case Some(s2) => loop(s2)
case None => ()
}
}
def act = loop(s)
}
def Bot(controller: Actor) = Actor(controller) {
s =>
val res = // do the calculations
controller ! (this, res)
None // finish work
}
val Controller = Actor(Map[Bot, ResultType]()) {s =>
// start bots, perhaps using results already stored in s
if (
// time to stop, e.g. all bots already finished
)
None
else
receive {
case (bot, res) => Some(s + (bot -> res)) // a bot has reported result
}
}
Controller.act

Related

Looking for something like a TestFlow analogous to TestSink and TestSource

I am writing a class that takes a Flow (representing a kind of socket) as a constructor argument and that allows to send messages and wait for the respective answers asynchronously by returning a Future. Example:
class SocketAdapter(underlyingSocket: Flow[String, String, _]) {
def sendMessage(msg: MessageType): Future[ResponseType]
}
This is not necessarily trivial because there may be other messages in the socket stream that are irrelevant, so some filtering is required.
In order to test the class I need to provide something like a "TestFlow" analogous to TestSink and TestSource. In fact I can create a flow by combining both. However, the problem is that I only obtain the actual probes upon materialization and materialization happens inside the class under test.
The problem is similar to the one I described in this question. My problem would be solved if I could materialize the flow first and then pass it to a client to connect to it. Again, I'm thinking about using MergeHub and BroadcastHub and again I see the problem that the resulting stream would behave differently because it is not linear anymore.
Maybe I misunderstood how a Flow is supposed to be used. In order to feed messages into the flow when sendMessage() is called, I need a certain kind of Source anyway. Maybe a Source.actorRef(...) or Source.queue(...), so I could pass in the ActorRef or SourceQueue directly. However, I'd prefer if this choice was up to the SocketAdapter class. Of course, this applies to the Sink as well.
It feels like this is a rather common case when working with streams and sockets. If it is not possible to create a "TestFlow" like I need it, I'm also happy with some advice on how to improve my design and make it better testable.
Update: I browsed through the documentation and found SourceRef and SinkRef. It looks like these could solve my problem but I'm not sure yet. Is it reasonable to use them in my case or are there any drawbacks, e.g. different behaviour in the test compared to production where there are no such refs?
Indirect Answer
The nature of your question suggests a design flaw which you are bumping into at testing time. The answer below does not address the issue in your question, but it demonstrates how to avoid the situation altogether.
Don't Mix Business Logic with Akka Code
Presumably you need to test your Flow because you have mixed a substantial amount of logic into the materialization. Lets assume you are using raw sockets for your IO. Your question suggests that your flow looks like:
val socketFlow : Flow[String, String, _] = {
val socket = new Socket(...)
//business logic for IO
}
You need a complicated test framework for your Flow because your Flow itself is also complicated.
Instead, you should separate out the logic into an independent function that has no akka dependencies:
type MessageProcessor = MessageType => ResponseType
object BusinessLogic {
val createMessageProcessor : (Socket) => MessageProcessor = {
//business logic for IO
}
}
Now your flow can be very simple:
val socket : Socket = new Socket(...)
val socketFlow = Flow.map(BusinessLogic.createMessageProcessor(socket))
As a result: your unit testing can exclusively work with createMessageProcessor, there's no need to test akka Flow because it is a simple veneer around the complicated logic that is tested independently.
Don't Use Streams For Concurrency Around 1 Element
The other big problem with your design is that SocketAdapter is using a stream to process just 1 message at a time. This is incredibly wasteful and unnecessary (you're trying to kill a mosquito with a tank).
Given the separated business logic your adapter becomes much simpler and independent of akka:
class SocketAdapter(messageProcessor : MessageProcessor) {
def sendMessage(msg: MessageType): Future[ResponseType] = Future {
messageProcessor(msg)
}
}
Note how easy it is to use Future in some instances and Flow in other scenarios depending on the need. This comes from the fact that the business logic is independent of any concurrency framework.
This is what I came up with using SinkRef and SourceRef:
object TestFlow {
def withProbes[In, Out](implicit actorSystem: ActorSystem,
actorMaterializer: ActorMaterializer)
:(Flow[In, Out, _], TestSubscriber.Probe[In], TestPublisher.Probe[Out]) = {
val f = Flow.fromSinkAndSourceMat(TestSink.probe[In], TestSource.probe[Out])
(Keep.both)
val ((sinkRefFuture, (inProbe, outProbe)), sourceRefFuture) =
StreamRefs.sinkRef[In]()
.viaMat(f)(Keep.both)
.toMat(StreamRefs.sourceRef[Out]())(Keep.both)
.run()
val sinkRef = Await.result(sinkRefFuture, 3.seconds)
val sourceRef = Await.result(sourceRefFuture, 3.seconds)
(Flow.fromSinkAndSource(sinkRef, sourceRef), inProbe, outProbe)
}
}
This gives me a flow I can completely control with the two probes but I can pass it to a client that connects source and sink later, so it seems to solve my problem.
The resulting Flow should only be used once, so it differs from a regular Flow that is rather a flow blueprint and can be materialized several times. However, this restriction applies to the web socket flow I am mocking anyway, as described here.
The only issue I still have is that some warnings are logged when the ActorSystem terminates after the test. This seems to be due to the indirection introduced by the SinkRef and SourceRef.
Update: I found a better solution without SinkRef and SourceRef by using mapMaterializedValue():
def withProbesFuture[In, Out](implicit actorSystem: ActorSystem,
ec: ExecutionContext)
: (Flow[In, Out, _],
Future[(TestSubscriber.Probe[In], TestPublisher.Probe[Out])]) = {
val (sinkPromise, sourcePromise) =
(Promise[TestSubscriber.Probe[In]], Promise[TestPublisher.Probe[Out]])
val flow =
Flow
.fromSinkAndSourceMat(TestSink.probe[In], TestSource.probe[Out])(Keep.both)
.mapMaterializedValue { case (inProbe, outProbe) =>
sinkPromise.success(inProbe)
sourcePromise.success(outProbe)
()
}
val probeTupleFuture = sinkPromise.future
.flatMap(sink => sourcePromise.future.map(source => (sink, source)))
(flow, probeTupleFuture)
}
When the class under test materializes the flow, the Future is completed and I receive the test probes.

How to enforce a contract through an hidden/internal state while being purely functional?

Use case
Visitors of a website can send me an email by providing their email address and message. To avoid spams, only 2 emails per minute (arbitrary) are allowed before being rate-limited.
Note that this is a learning exercise for me to get more used to functional programming practices so while that might seem overkill, it's a step for me to extend this to more complex systems.
Implementation
To do so, the ContactMailer class I have so far exposes a single method, send, which sends an email based on provided info. It handles all of preparing the mailer/email, enforcing the rate-limit contract, and actually sending the email (using courier):
import courier._, Defaults._
import scala.collection.mutable.Queue
import scala.concurrent.Future
case class ContactMailer(host: String, port: Int, username: String, password: String) {
private val mailer = Mailer(host, port).auth(true).as(username, password).startTtls(true)()
private val envelope = Envelope.from("no-reply" `#` "example.com").to("astorije" `#` "example.com")
private val queue = Queue[Long]()
def send(from: String, subject: String, content: String): Future[String] = {
val now = System.currentTimeMillis()
// If queue is full and oldest known message is < 1 minute ago, rate
// limit, otherwise send email, drop oldest known timestamp and enqueue
// the new one
if (queue.length == 2) {
val oldest = queue.head
if (now - oldest < 60 * 1000) return Future("message rate limited")
else queue.dequeue()
}
queue.enqueue(now)
mailer(envelope.replyTo(from.addr).subject(subject).content(Text(content)))
.map(_ => "message delivered")
.recover { case ex => s"message failed: ${ex}" }
}
}
And consumer (another part of the application) calls it like this:
scala> val mailer = ContactMailer("smtp.example.com", 25, "username", "password")
mailer: ContactMailer = ContactMailer(smtp.example.com,25,username,password)
scala> mailer.send("foo#bar.com", "One", "...").foreach(println(_))
message delivered
scala> mailer.send("foo#bar.com", "Two", "...").foreach(println(_))
message delivered
scala> mailer.send("foo#bar.com", "Three", "...").foreach(println(_))
message rate limited
Problem
Perfect, it works. But because I have learned and exclusively used OOP so far, this has all its characteristics: a mutable state, side effects, no referential transparency as calling multiple times with the same inputs can result in different outputs, etc.
How can I keep this rate-limit in a purely functional programming style?
If I extract this internal state (the timestamp queue) outside of the mailer and request the consumer to provide it such as def send(previousQueue: Queue[Long], from: String, ...): (Queue[Long], Future[String]), how can I ensure that the consumer will always respect this rate limit and not send an empty queue to never be rate-limited?
Is there a way to keep ContactMailer focused on what it should do (send an email), and extract this rate limiting into a less specialized layer (a generic rate limiter, whatever it is trying to limit)? Is it agood idea in the first place?
I read about a lot of generic approaches for this, and still don't know what to choose from: IO monad? State monad? Free monad? Actor system? It seems to me the last one would only shift the problem and be inappropriate in this limited context.
In general, what would be a good structure for this use case in an FP manner?
In general, I do not know how to approach this. There are a lot of resources out there, but they are either too simplistic and explain the basics that can hardly apply in a real-world situation, or too abstract and theoretical for my little experience to translate them into this example.
I obviously have complete freedom to update the signature of the class(es)/function(s) since I control the whole sequence of operations myself.
I hope this will not be flagged as opinion-based. I understand why it would feel that way, but I am actually stuck on how to get to a better, concrete implementation. :)
Your system has two interactions with the outside world, sending emails and rate-limiting actions based on time. Both of these interactions have side effects.
The simplest way to model these would be as external services is using the IO monad.
// Assuming sending email cannot fail, IO[Either[EmailError, Unit]] otherwise
def sendEmail(email: Email): IO[Unit]
def rateLimit(): IO[Boolean]
Your implementation now simply "combines" these two:
case object RateLimitReached
def mySend(email: Email): IO[Either[RateLimitReached.type, Unit]] =
for {
token <- rateLimit()
result <- if (token)
IO(Left(RateLimitReached))
else
sendEmail(email).map(Right)
} yield result
If you call runAsync (assuming you use cats.effect.IO) on the result of mySend right away you might as well use Future instead of IO, the two will be equivalent for all intent and purposes.
How can I keep this rate-limit in a purely functional programming style?
Rate limiting is reading the time (a side effect) and mutating a local counter. You could do the latter with a State monad, but that would IMO be over engineered in Scala.
I read about a lot of generic approaches for this, and still don't know what to choose from: IO monad? State monad? Free monad? Actor system? It seems to me the last one would only shift the problem and be inappropriate in this limited context.
IO is the default answer. Actors are based on the Any => Unit function type, so if you like purity and type safety you are not going to be friend with actors. The main benefit of Free vs IO is to be able to get more precise type signature. Returning IO[Result] means you could be doing anything inside, with Free you could be as precise as you like:
def sendEmail(email: Email): Free[EmailEffect, Unit]
def rateLimit(): Free[ReadTheTime, Boolean]
// | |
// list of side effect |
// return type
type MyEffect[X] = Either[EmailEffect[X], ReadTheTime[X]]
def mySend(email: Email): Free[MyEffect, Either[RateLimitReached.type, Unit]] =
But that extra level of indirection comes with some complexity, and a lot of boilerplate because in the end everything still needs to be interpreted to an IO[Either[RateLimitReached.type, Unit]].

Play 2.5.x (Scala) -- How does one put a value obtained via wsClient into a (lazy) val

The use case is actually fairly typical. A lot of web services use authorization tokens that you retrieve at the start of a session and you need to send those back on subsequent requests.
I know I can do it like this:
lazy val myData = {
val request = ws.url("/some/url").withAuth(user, password, WSAuthScheme.BASIC).withHeaders("Accept" -> "application/json")
Await.result(request.get().map{x => x.json }, 120.seconds)
}
That just feels wrong as all the docs say never us Await.
Is there a Future/Promise Scala style way of handling this?
I've found .onComplete which allows me to run code upon the completion of a Promise however without using a (mutable) var I see no way of getting a value in that scope into a lazy val in a different scope. Even with a var there is a possible timing issue -- hence the evils of mutable variables :)
Any other way of doing this?
Unfortunately, there is no way to make this non-blocking - lazy vals are designed to be synchronous and to block any thread accessing them until they are completed with a value (internally a lazy val is represented as a simple synchronized block).
A Future/Promise Scala way would be to use a Future[T] or a Promise[T] instead of a val x: T, but that way implies a great deal of overhead with executionContexts and maps upon each use of the val, and more optimal resource utilization may not be worth the decreased readability in all cases, so it may be OK to leave the Await there if you extensively use the value in many parts of your application.

What are advantages of a Twitter Future over a Scala Future?

I know a lot of reasons for Scala Future to be better. Are there any reasons to use Twitter Future instead? Except the fact Finagle uses it.
Disclaimer: I worked at Twitter on the Future implementation. A little bit of context, we started our own implementation before Scala had a "good" implementation of Future.
Here're the features of Twitter's Future:
Some method names are different and Twitter's Future has some new helper methods in the companion.
e.g. Just one example: Future.join(f1, f2) can work on heterogeneous Future types.
Future.join(
Future.value(new Object), Future.value(1)
).map {
case (o: Object, i: Int) => println(o, i)
}
o and i keep their types, they're not casted into the least common supertype Any.
A chain of onSuccess is guaranteed to be executed in order:
e.g.:
f.onSuccess {
println(1) // #1
} onSuccess {
println(2) // #2
}
#1 is guaranteed to be executed before #2
The Threading model is a little bit different. There's no notion of ExecutionContext, the Thread that set the value in a Promise (Mutable implementation of a Future) is the one executing all the computations in the future graph.
e.g.:
val f1 = new Promise[Int]
f1.map(_ * 2).map(_ + 1)
f1.setValue(2) // <- this thread also executes *2 and +1
There's a notion of interruption/cancellation. With Scala's Futures, the information only flows in one direction, with Twitter's Future, you can notify a producer of some information (not necessarily a cancellation). In practice, it's used in Finagle to propagate the cancellation of a RPC. Because Finagle also propagates the cancellation across the network and because Twitter has a huge fan out of requests, this actually saves lots of work.
class MyMessage extends Exception
val p = new Promise[Int]
p.setInterruptHandler {
case ex: MyMessage => println("Receive MyMessage")
}
val f = p.map(_ + 1).map(_ * 2)
f.raise(new MyMessage) // print "Receive MyMessage"
Until recently, Twitter's Future were the only one to implement efficient tail recursion (i.e. you can have a recursive function that call itself without blowing up you call stack). It has been implemented in Scala 2.11+ (I believe).
As far as I can tell the main difference that could go in favor of using Twitter's Future is that it can be cancelled, unlike scala's Future.
Also, there used to be some support for tracing the call chains (as you probably know plain stack traces are close to being useless when using Futures). In other words, you could take a Future and tell what chain of map/flatMap produced it. But the idea has been abandoned if I understand correctly.

Akka actor forward message with continuation

I have an actor which takes the result from another actor and applies some check on it.
class Actor1(actor2:Actor2) {
def receive = {
case SomeMessage =>
val r = actor2 ? NewMessage()
r.map(someTransform).pipeTo(sender)
}
}
now if I make an ask of Actor1, we now have 2 futures generated, which doesnt seem overly efficient. Is there a way to provide a foward with some kind of continuation, or some other approach I could use here?
case SomeMessage => actor2.forward(NewMessage, someTransform)
Futures are executed in an ExecutionContext, which are like thread pools. Creating a new future is not as expensive as creating a new thread, but it has its cost. The best way to work with futures is to create as much as needed and compose then in a way that things that can be computed in parallel are computed in parallel if the necessary resources are available. This way you will make the best use of your machine.
You mentioned that akka documentation discourages excessive use of futures. I don't know where you read this, but what I think it means is to prefer transforming futures rather than creating your own. This is exactly what you are doing by using map. Also, it may mean that if you create a future where it is not needed you are adding unnecessary overhead.
In your case you have a call that returns a future and you need to apply sometransform and return the result. Using map is the way to go.