I get inconsistent results from Monix firstOptionL - race condition? - scala

I get an intermittent missing value from repeated calls to the same (MongoDB) database call which I've converted into an Observable. I've removed all database code to get a minimal test case which only has the Monix bits and I'm still missing values occasionally - usually one or two per 2,000 tests.
According to the docs ConcurrentSubject means "one doesn't need to follow the back-pressure contract", but I get similar failures whether I do or not.
import monix.eval.Task
import monix.reactive.{MulticastStrategy, Observable}
import monix.reactive.subjects.ConcurrentSubject
import org.scalatest.FunSuite
import scala.concurrent.Await
import scala.concurrent.duration.Duration
class Test_JustMonix extends FunSuite {
implicit val scheduler = monix.execution.Scheduler.global
def build(): Observable[Boolean] = {
val subject = ConcurrentSubject(MulticastStrategy.publish[Boolean])
subject.doAfterSubscribe {
Task.eval {
subject.onNext(true)
subject.onComplete()
}
}
}
test("just monix") {
(0 until 20).foreach { loop =>
println(s"loop $loop")
val tOpts = (0 until 100).map { _ => build().firstOptionL }
val tDone = Task.gather(tOpts).map { list =>
val emptyCount = list.count(_.isEmpty)
assert(emptyCount === 0)
}
Await.result(tDone.runToFuture, Duration.Inf)
}
println("Finished")
}
}
On some runs, all 20x100 loops complete correctly - firstOptionL isDefined for all 2,000 results. However, more than 50% of the time the assert(emptyCount === 0) triggers when the value is 1 or sometimes 2, indicating that occasionally I am getting a None value, as if onComplete was occurring before onNext?
This can happen in any of the 20 loops, so it looks like either a race condition or I am misunderstanding the required input. I've tried pretty much all subjects - PublishSubject, with and without BufferedSubscriber and all give similar results.
I've also tried delaying the onComplete until the Ack via
subject.onNext(true).map(_=> subject.onComplete())
and that seems to be fail slightly sooner.
I've also tried MulticastStrategy.replay with no difference.
I'm using Monix 3.0.0-RC3 on Scala 2.12.8.

Related

How to read only Successful values from a Seq of Futures

I am learning akka/scala and am trying to read only those Futures that succeeded from a Seq[Future[Int]] but cant get anything to work.
I simulated an array of 10 Future[Int] some of which fail depending on the value FailThreshold takes (all fail for 10 and none fail for 0).
I then try to read them into an ArrayBuffer (could not find a way to return immutable structure with the values).
Also, there isn't a filter on Success/Failure so had to run an onComplete on each future and update buffer as a side-effect.
Even when the FailThreshold=0 and the Seq has all Future set to Success, the array buffer is sometimes empty and different runs return array of different sizes.
I tried a few other suggestions from the web like using Future.sequence on the list but this throws exception if any of future variables fail.
import akka.actor._
import akka.pattern.ask
import scala.concurrent.{Await, Future, Promise}
import scala.concurrent.duration._
import scala.util.{Timeout, Failure, Success}
import concurrent.ExecutionContext.Implicits.global
case object AskNameMessage
implicit val timeout = Timeout(5, SECONDS)
val FailThreshold = 0
class HeyActor(num: Int) extends Actor {
def receive = {
case AskNameMessage => if (num<FailThreshold) {Thread.sleep(1000);sender ! num} else sender ! num
}
}
class FLPActor extends Actor {
def receive = {
case t: IndexedSeq[Future[Int]] => {
println(t)
val b = scala.collection.mutable.ArrayBuffer.empty[Int]
t.foldLeft( b ){ case (bf,ft) =>
ft.onComplete { case Success(v) => bf += ft.value.get.get }
bf
}
println(b)
}
}
}
val system = ActorSystem("AskTest")
val flm = (0 to 10).map( (n) => system.actorOf(Props(new HeyActor(n)), name="futureListMake"+(n)) )
val flp = system.actorOf(Props(new FLPActor), name="futureListProcessor")
// val delay = akka.pattern.after(500 millis, using=system.scheduler)(Future.failed( throw new IllegalArgumentException("DONE!") ))
val delay = akka.pattern.after(500 millis, using=system.scheduler)(Future.successful(0))
val seqOfFtrs = (0 to 10).map( (n) => Future.firstCompletedOf( Seq(delay, flm(n) ? AskNameMessage) ).mapTo[Int] )
flp ! seqOfFtrs
The receive in FLPActor mostly gets
Vector(Future(Success(0)), Future(Success(1)), Future(Success(2)), Future(Success(3)), Future(Success(4)), Future(Success(5)), Future(Success(6)), Future(Success(7)), Future(Success(8)), Future(Success(9)), Future(Success(10)))
but the array buffer b has varying number of values and empty at times.
Can someone please point me to gaps here,
why would the array buffer have varying sizes even when all Future have resolved to Success,
what is the correct pattern to use when we want to ask different actors with TimeOut and use only those asks that have successfully returned for further processing.
Instead of directly sending the IndexedSeq[Future[Int]], you should transform to Future[IndexedSeq[Int]] and then pipe it to the next actor. You don't send the Futures directly to an actor. You have to pipe it.
HeyActor can stay unchanged.
After
val seqOfFtrs = (0 to 10).map( (n) => Future.firstCompletedOf( Seq(delay, flm(n) ? AskNameMessage) ).mapTo[Int] )
do a recover, and use Future.sequence to turn it into one Future:
val oneFut = Future.sequence(seqOfFtrs.map(f=>f.map(Some(_)).recover{ case (ex: Throwable) => None})).map(_.flatten)
If you don't understand the business with Some, None, and flatten, then make sure you understand the Option type. One way to remove values from a sequence is to map values in the sequence to Option (either Some or None) and then to flatten the sequence. The None values are removed and the Some values are unwrapped.
After you have transformed your data into a single Future, pipe it over to FLPActor:
oneFut pipeTo flp
FLPActor should be rewritten with the following receive function:
def receive = {
case printme: IndexedSeq[Int] => println(printme)
}
In Akka, modifying some state in the main thread of your actor from a Future or the onComplete of a Future is a big no-no. In the worst case, it results in race conditions. Remember that each Future runs on its own thread, so running a Future inside an actor means you have concurrent work being done in different threads. Having the Future directly modify some state in your actor while the actor is also processing some state is a recipe for disaster. In Akka, you process all changes to state directly in the primary thread of execution of the main actor. If you have some work done in a Future and need to access that work from the main thread of an actor, you pipe it to that actor. The pipeTo pattern is functional, correct, and safe for accessing the finished computation of a Future.
To answer your question about why FLPActor is not printing out the IndexedSeq correctly: you are printing out the ArrayBuffer before your Futures have been completed. onComplete isn't the right idiom to use in this case, and you should avoid it in general as it isn't good functional style.
Don't forget the import akka.pattern.pipe for the pipeTo syntax.

What's the best way to wrap a monix Task with a start time and end time print the difference?

This is what I'm trying right now but it only prints "hey" and not metrics.
I don't want to add metric related stuff in the main function.
import java.util.Date
import monix.eval.Task
import monix.execution.Scheduler.Implicits.global
import scala.concurrent.Await
import scala.concurrent.duration.Duration
class A {
def fellow(): Task[Unit] = {
val result = Task {
println("hey")
Thread.sleep(1000)
}
result
}
}
trait AA extends A {
override def fellow(): Task[Unit] = {
println("AA")
val result = super.fellow()
val start = new Date()
result.foreach(e => {
println("AA", new Date().getTime - start.getTime)
})
result
}
}
val a = new A with AA
val res: Task[Unit] = a.fellow()
Await.result(res.runAsync, Duration.Inf)
You can describe a function such as this:
def measure[A](task: Task[A], logMillis: Long => Task[Unit]): Task[A] =
Task.deferAction { sc =>
val start = sc.clockMonotonic(TimeUnit.MILLISECONDS)
val stopTimer = Task.suspend {
val end = sc.clockMonotonic(TimeUnit.MILLISECONDS)
logMillis(end - start)
}
task.redeemWith(
a => stopTimer.map(_ => a)
e => stopTimer.flatMap(_ => Task.raiseError(e))
)
}
Some piece of advice:
Task values should be pure, along with the functions returning Tasks — functions that trigger side effects and return Task as results are broken
Task is not a 1:1 replacement for Future; when describing a Task, all side effects should be suspended (wrapped) in Task
foreach triggers the Task's evaluation and that's not good, because it triggers the side effects; I was thinking of deprecating and removing it, since its presence is tempting
stop using trait inheritance and just use plain functions — unless you deeply understand OOP and subtyping, it's best to avoid it if possible; and if you're into the Cake pattern, stop doing it and maybe join a support group 🙂
never measure time duration via new Date(), you need a monotonic clock for that and on top of the JVM that's System.nanoTime, which can be accessed via Monix's Scheduler by clockMonotonic, as exemplified above, the Scheduler being given to you via deferAction
stop blocking threads, because that's error prone — instead of doing Thread.sleep, do Task.sleep and all Await.result calls are problematic, unless they are in main or in some other place where dealing with asynchrony is not possible
Hope this helps.
Cheers,
Like #Pierre mentioned, latest version of Monix Task has Task.timed, you can do
timed <- task.timed
(duration, t) = timed

Usage of Ask pattern causing the creation of default-akka.actor.default-dispatcher threads for each request

Information: You might want to skip directly to Edit 4 after the introduction.
Recently I wrote a simple scala server app. The app mostly wrote incoming data to the database or retrieved it. This is my first scala-akka application.
When I deployed the app to my server it failed after about a day. I realized from the simple statistics provided by digitalocean that the CPU usage was rising in a linear fashion if I s̶e̶n̶d̶ ̶d̶a̶t̶a̶ ̶t̶o̶ request data from the server. If I didn't s̶e̶n̶d̶ request anything from the server the CPU usage was about constant but wasn't falling from it's previous state.
I connected the app to visualvm and saw that the number of threads is either constant if I don't do anything with the app (I), or grows if I send stuff to the server (II) in a saw-like fashion.
There is an obvious coloration here between the number of threads and CPU usage which makes sens.
When I checked the threads tab I saw that most threads are default-akka.actor.default-dispatcher threads
They also don't seem to be doing much.
What could cause this sort of problem? How do I solve it?
Ad. Edit4: I think I found the source of the problem, but I still don't understand why it happens, and how I should solve it.
PS: I must admit that the screenshots are not from the application which failed. I don't have any from the original failure. However the only difference between this program and the one that failed is that in application.conf I added:
actor {
default-dispatcher {
fork-join-executor {
# Settings this to 1 instead of 3 seems to improve performance.
parallelism-factor = 2.0
parallelism-max = 24
task-peeking-mode = FIFO
}
}
}
This seems to have slowed the speed by which the number of threads rises, but didn't solve the problem.
Edit: Fragment of WriterActor usage (RestApi)
trait RestApi extends CassandraCluster {
import models._
import cassandraDB.{WriterActor, ReaderActor}
implicit def system: ActorSystem
implicit def materializer: ActorMaterializer
implicit def ec: ExecutionContext
implicit val timeout = Timeout(20 seconds)
val cassandraWriterWorker = system.actorOf(Props(new WriterActor(cluster)), "cassandra-writer-actor")
val cassandraReaderWorker = system.actorOf(Props(new ReaderActor(cluster)), "cassandra-reader-actor")
...
def cassandraReaderCall(message: Any): ToResponseMarshallable = message match {
//...
case message: GetActiveContactsByPhoneNumber => (cassandraReaderWorker ? message)(2 seconds).mapTo[Vector[String]].map(result => Json.obj("active_contacts" -> result))
case _ => StatusCodes.BadRequest
}
def confirmedWriterCall(message: Any) = {
(cassandraWriterWorker ? message).mapTo[Boolean].map(result => result)
}
val apiKeyStringV1 = "test123"
val route =
...
path("contacts") {
parameter('apikey ! apiKeyStringV1) {
post {
entity(as[Contacts]){ contact: Contacts =>
cassandraWriterWorker ! contact
complete(StatusCodes.OK)
}
} ~
get {
parameter('phonenumber) { phoneNumber: String =>
complete(cassandraReaderCall(GetActiveContactsByPhoneNumber(phoneNumber)))
}
}
}
} ~
} ~
path("log"/ "gps") {
parameter('apikey ! apiKeyStringV1) {
(post & entity(as[GpsLog])) { gpsLog =>
cassandraWriterWorker ! gpsLog
complete(StatusCodes.OK)
}
}
}
}
}
}
Edit 2: Writer Worker relevant code.
I didn't post all the methods since they are all basically the same. But here you can find the whole file
import java.util.UUID
import akka.actor.Actor
import com.datastax.driver.core.Cluster
import models._
class WriterActor(cluster: Cluster) extends Actor{
import scala.collection.JavaConversions._
val session = cluster.connect(Keyspaces.akkaCassandra)
// ... other inserts
val insertGpsLog = session.prepare("INSERT INTO gps_logs(id, phone_number, lat, long, time) VALUES (?,?,?,?,?);")
// ...
def insertGpsLog(phoneNumber: String, locWithTime: LocationWithTime): Unit =
session.executeAsync(insertGpsLog.bind(UUID.randomUUID().toString, phoneNumber, new java.lang.Double(locWithTime.location.lat),
new java.lang.Double(locWithTime.location.long), new java.lang.Long(locWithTime.time)))
def receive: Receive = {
// ...
case gpsLog: GpsLog => gpsLog.locationWithTimeLog.foreach(locWithTime => insertGpsLog(gpsLog.phoneNumber, locWithTime))
}
}
Edit 3. Miss diagnosis of excessive thread use.
I'm afraid I miss diagnosed the origin of the problem.Later on, I have added a request for data after the write and forgot about it. When I removed it the number of threads stopped growing. So this is the most likely place where the mistake was made. I updated the trait where the ReaderActor is used and also added the relevant code of the ReaderActor below.
object ReaderActor {
// ...
case class GetActiveContactsByPhoneNumber(phoneNumber: String)
}
class ReaderActor(cluster: Cluster) extends Actor {
import models._
import ReaderActor._
import akka.pattern.pipe
import scala.collection.JavaConversions._
import cassandra.resultset._
import context.dispatcher
val session = cluster.connect(Keyspaces.akkaCassandra)
def buildActiveContactByPhoneNumberResponse(r: Row): String = {
val phoneNumber = r.getString(ContactsKeys.phoneNumber)
return phoneNumber
}
def buildSubSelectContactsList(r: Row): java.util.List[String] = {
val phoneNumber = r.getSet(ContactsKeys.contacts, classOf[String])
return phoneNumber.toList
}
def receive: Receive = {
//...
case GetActiveContactsByPhoneNumber(phoneNumber: String) =>
val subQuery = QueryBuilder.select(ContactsKeys.contacts).
from(Keyspaces.akkaCassandra, ColumnFamilies.contact).
where(QueryBuilder.eq(ContactsKeys.phoneNumber, phoneNumber))
def queryActiveUsers(phoneNumbers: java.util.List[String]) = QueryBuilder.select(ContactsKeys.phoneNumber).
from(Keyspaces.akkaCassandra, ColumnFamilies.contact).
where(QueryBuilder.in(ContactsKeys.phoneNumber, phoneNumbers))
session.execute(subQuery) map((row: Row) => session.executeAsync(queryActiveUsers(buildSubSelectContactsList(row))) map(_.all().map(buildActiveContactByPhoneNumberResponse).toVector) pipeTo sender)
//...
}
}
Edit 4
I run the code locally, controlling all the request. When there are no requests the number of running threads alternates around a certain number, but doesn't have a tendency to go up or down.
I made a variety of requests to see what will change.
The Image posted below shows several states.
I - no requests yet. number of threads 44-45
II - after a request to the ReaderActor. number of threads 46-47
III - after a request to the ReaderActor. number of threads 48-49
IV - after a request to the ReaderActor. number of threads 50-51
V - after a request to the WrtierActor. number of threads 51-52 (but no problem, notice a daemon thread was started)
VI - after a request to the WriterActor. number of threads 51-52 (constant)
VII - after a request to the ReaderActor (but a different resource then the first three). number of threads 53-54
So what happens is every time we read from the database (regardless how many executeAsync calls are used) 2 extra threads are created. The only difference between the read and the write calls is that one uses the ask pattern and the other doesn't. I checked it by changing the route from:
get {
parameter('phonenumber) { phoneNumber: String =>
complete(cassandraReaderCall(GetActiveContactsByPhoneNumber(phoneNumber)))
}
}
to
get {
parameter('phonenumber) { phoneNumber: String =>
cassandraReaderWorker ! GetActiveContactsByPhoneNumber(phoneNumber)
complete(StatusCodes.OK)
}
}
obviously not getting any results now, but also not spawning those threads.
So the answer seems to lie in the ask pattern.
I hope somebody can provide an answer as to why this happens and how to solve it?

Execute multiple tasks in parallel, pick answer from first completed

I have n different sources to, say, gets rates of USD to EUR. Let n = 3 and the sources be Google, Yahoo, MyRates with corresponding methods:
def getYahooRate:Double = ???
def getGoogleRate:Double = ???
def getMyRate:Double = ???
I want to query the rate of USD to EUR in such a way that all n sources are polled in parallel and the first response to be received is immediately returned. If none reply within a specified time-frame, then an exception is thrown.
What is the canonical way to implement this using Scala (and if necessary Akka)?
Is there any library method that does most of this?
EDIT: Here is what I have tried. Some comments on the code would be appreciated:
This is somewhat like a parallel version of trycatch from this SO question. The code for the below method is based on this SO answer
type unitToT[T] = ()=>T
def trycatchPar[B](list:List[unitToT[B]], timeOut:Long):B = {
if (list.isEmpty) throw new Exception("call list must be non-empty")
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent._
import scala.concurrent.duration._
import scala.util.Failure
import scala.util.Success
val p = promise[B]
val futures = list.map(l => Future{l()})
futures foreach {
_ onComplete {
case s # Success(_) => {
// Arbitrarily return the first success
p tryComplete s
}
case s # Failure(_) =>
}
}
Await.result(p.future, timeOut millis)
}
You can use Future.firstCompletedOf
val first = Future.firstCompletedOf(futures)
Await.result(first, timeOut.millis)

scala 2.10 callback at the end of a `Deadline`

In Scala 2.10, along with the new Future/Promise API, they introduced a Duration and Deadline utilities (as described here). I looked around but couldn't find anything that comes with the scala standard library, to do something like:
val deadline = 5 seconds fromNow
After(deadline){
//do stuff
}
//or
val deadlineFuture: Future[Nothing] = (5 seconds fromNow).asFuture
deadlineFuture onComplete {
//do stuff
}
Is there anything like that available that I've missed, or will I have to implement this kind of behavior myself?
Not quite built in, but they provide just enough rope.
The gist is to wait on an empty promise that must disappoint (i.e., time out).
import scala.concurrent._
import scala.concurrent.duration._
import scala.util._
import ExecutionContext.Implicits.global
object Test extends App {
val v = new SyncVar[Boolean]()
val deadline = 5 seconds fromNow
future(Await.ready(Promise().future, deadline.timeLeft)) onComplete { _ =>
println("Bye, now.")
v.put(true)
}
v.take()
// or
val w = new SyncVar[Boolean]()
val dropdeadline = 5 seconds fromNow
val p = Promise[Boolean]()
p.future onComplete {_ =>
println("Bye, now.")
w.put(true)
}
Try(Await.ready(Promise().future, dropdeadline.timeLeft))
p trySuccess true
w.take()
// rolling it
implicit class Expiry(val d: Deadline) extends AnyVal {
def expiring(f: =>Unit) {
future(Await.ready(Promise().future, d.timeLeft)) onComplete { _ =>
f
}
}
}
val x = new SyncVar[Boolean]()
5 seconds fromNow expiring {
println("That's all, folks.")
x.put(true)
}
x.take() // wait for it
}
Its just a timestamp holder. For example you need to distribute execution of N sequential tasks, in T hours. When you have finished with the first one, you check a deadline and schedule next task depending on (time left)/(tasks left) interval. At some point of time isOverdue() occurs, and you just execute tasks left, in parallel.
Or you could check isOverdue(), and if still false, use timeLeft() for setting timeout on executing the next task, for example.
It's much better than manipulating with Date and Calendar to determine time left. Also Duration was used in Akka for timing.