Racy tail recursive function - scala

I am trying to do some processing on a SynchronizedQueue using a tail recursive function. The function seems to work properly but the more I think about concurrency the more I believe I could have some race conditions when accessing this queue with different threads. Here is the function that I think I could use some help with:
val unsavedMessages = new SynchronizedQueue[CachedMessage]()
val MAX_BATCH = 256
val rowCount = new AtomicInteger()
private def validateCacheSize() = if (unsavedMessages.length > MAX_BATCH) {
implicit val batch = createBatch
val counter = rowCount.getAndIncrement
#tailrec
def processQueue(queue: SynchronizedQueue[CachedMessage]): Unit = if (queue.nonEmpty) {
val cm = queue.dequeue
addToBatch(cm.request, cm.timestamp, cm.brokerId, counter)
processQueue(queue)
}
processQueue(unsavedMessages)
executeBatch
resetQueue
}
def resetQueue = unsavedMessages.clear
Multiple threads call this function:
def add(request: WebserviceRuleMatch, timestamp: Long, brokerId: String) = {
validateCacheSize
//log.info("enquing request "+ unsavedMessages.length)
unsavedMessages.enqueue(CachedMessage(request, timestamp, brokerId))
}
Does anyone have any pointers on how to improve this so there would likely not be a race condition?

there could be a chance that the queue gets emptied between queue.nonempty and queue.dequeue
Avoid calling multiple queue operations that must be synchronized within your code. Use the power of SynchronizedQueue to do atomic thread-safe operations. E.g. avoid calling queue.nonempty altogether (alternative to tail-recursion):
for (cm <- unsavedMessages.dequeueAll(_ => true))
addToBatch(cm.request, cm.timestamp, cm.brokerId, counter)
executeBatch
//resetQueue -- Don't do this! Not thread-safe
I think messages could be added by a thread between processQueue and resetQueue
There will always be a point at which your code has taken a 'snapshot' of the queue and emptied it. My previous point ensured that the 'snapshot' and emptying are a single atomic operation. If new entries are enqueued at any point after that atomic 'snapshot & empty' operation - no problem. Your 'snapshot & empty' must occur somewhere and new items enqueued are a fact of life. Make the decision to allow new items to be enqueued at any point subsequent to the 'snapshot & empty'. They'll be processed on next cycle. i.e. nothing extra needed beyond above point.
Robin Green: (By the way, that method seems to have a very misleading name!)
Wot he said! :)

The add function gets gets called from a future so I feel as though there could be a chance that the queue gets emptied between queue.nonempty and queue.dequeue.
Yes, it could. You could use double-checked locking to make validateCacheSize single-threaded. (By the way, that method seems to have a very misleading name!)
Also I think messages could be added by a thread between processQueue and resetQueue.
Yes, they could. But why do you need to call unsavedMessages.clear at all? queue.dequeue already removes them from the queue. So the only unsavedMessages that should exist in the queue then are ones that still remain to be processed.

Related

How to do a `getOrElseComplete` on `Promise`?

Does it make sense to have an operation like getOrElseComplete that tries to complete a Promise with a value but, if the Promise is already completed, returns the existing completed value instead. Here's a sample implementation:
implicit class PromiseOps[T](promise: Promise[T]) {
def getOrElseComplete(value: Try[T]): Try[T] = {
val didComplete = promise.tryComplete(value)
if (didComplete) {
value
} else {
// The `tryComplete` returned `false`, which means the promise
// was already completed, so the following is safe (?)
promise.future.value.get
}
}
}
Is this safe to do? If not, why not? If so, is there a way to do this directly (eg. without relying on questionable things like _.value.get) or why isn't there such a way in the standard library?
From your comments it seems to me that this is a valid solution for your problem but I also feel that a method like this doesn't belong in Promise API because Promise is supposed to be only a settable "backend" of its Future.
I'd prefer to have a completely separate function which could look like this:
def getOrComplete[T](promise: Promise[T], value: Try[T]): Try[T] =
promise.future.value.getOrElse {
if (promise.tryComplete(value)) value
else getOrComplete(promise, value)
}
The recursive call may seem weird - it serves two purposes:
it protects against a race condition where some other thread completes the future just before we call tryComplete
it avoids usage of .value.get on the Future
You might also want to pass value as a by-name parameter to avoid evaluating it when the Promise is already completed.
This operation does what it promises. It may make more sense to take value by name, and don't try to complete if already completed, maybe something like
def getOrElseComplete(value: => Try[T]): Try[T] = {
if (!promise.completed) {
promise.tryComplete(value)
}
promise.future.value.get
}
It's kinda dodgy though. Sharing a promise and having multiple places where it might be completed sounds like a difficult to maintain design, and one has to ask what's happening with the other path that might still complete the Promise? Shouldn't something be cancelled there?

Scala futures and JMM

I have a question about JMM and Scala futures.
In the following code, I have non-immutable Data class. I create an instance of it inside one thread(inside Future apply body), and then subscribe on completion event.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
object Hello extends App {
Future {
new Data(1, "2")
}.foreach { d =>
println(d)
}
Thread.sleep(100000)
}
class Data(var someInt: Int, var someString: String)
Can we guarantee that:
foreach body called from the same thread, where a Data instance was created?
If not, can we guarantee that actions inside the Future.apply happens-before(in terms of JMM) actions inside foreach body?
Completion happens-before callback execution.
Disclaimer: I am the main contributor.
I had a sort-of similar question, and what I found is -
1) in the doc Intellij so conveniently pulled up for me it says
Asynchronously processes the value in the future once the value becomes available...
2) on https://docs.scala-lang.org/overviews/core/futures.html it says
The result becomes available once the future completes.
Basically, it does not anywhere I can find say explicitly that there is a memory barrier. I suspect, however, that it is a safe assumption that there is. Otherwise the language would simply not work.
No.
You can get a good idea of this by looking through the source code for Promise/DefaultPromise/Future, which schedules the callback for foreach on the execution context/adds it to the listeners without any special logic requiring it to run on the original thread...
But you can also verify it experimentally, by trying to set up an execution context and threads such that something else will already be queued for execution when the Future in which Data was created completes.
implicit val context = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(2))
Future {
new Data(1, "2")
println("Data created on: " + Thread.currentThread().getName)
Thread.sleep(100)
}.foreach { _ =>
println("Data completed on: " + Thread.currentThread().getName)
}
Future { // occupies second thread
Thread.sleep(1000)
}
Future { // queue for execution while first future is still executing
Thread.sleep(2000)
}
My output:
Data created on: pool-$n-thread-1
Data completed on: pool-$n-thread-2
2.
Less confident here than I'd like to be, but I'll give it a shot:
Yes.
DefaultPromise, the construct underlying Future, is wrapping an atomic reference, which behaves like a volatile variable. Since the write-to for updating the result must happen prior to the read-from which passes the result to the listener so it can run the callback, JMM volatile variable rules turn this into a happens-before relationship.
I don't think there are any guarantees that foreach is called from the same thread
foreach will not be called until the future completes succesfully. onComplete is a more idiomatic way of providing a callback to process the result of a Future.

Iterator.continually vs while(..) in Scala

def task() {
Thread.sleep(60*1000) // update lru cache every minute
// do some compute-intensive task here to populate lru cache
println("LRU Updated!")
}
new Thread {
override def run = while(true) task()
}.start
vs
Iterator.continually(task()).dropWhile(_=>true)
have the exact same behavior. Are they equivalent under the hood as well ?
Iterator.continually is useful to get an iterator that you can then use like a pseudo collection, for non side effecting results. Here you are doing a println, so continually is a nice sugar, but it doesn't really give you anything.
If you had something like:
def task() {
Thread.sleep(60*1000) // update lru cache every minute
// do some compute-intensive task here to populate lru cache
"LRU Updated!"
}
To apply a side effect, you could still use thread, but you wouldn't be able to do much more (without more complex code).
new Thread {
override def run = while(true) println(task())
}.start
If you wanted to transform the output, or push it to another call, or compose it, etc. then an iterator would be a nicer abstraction:
Iterator.continually(task()).map(x => s"$x Yay!").take(100)
But I guess that's not really what you want to do from your example.

Querying a continously running operation for its current state/value in Scala

I have a procedure that continuously updates a value. I want to be able to periodically query the operation for the current value. In my particular example, every update can be considered an improvement and the procedure will eventually converge on a final, best answer, but I want/need access to the intermediate results. The speed with which the loop executes and the time it takes to converge matters.
As an example, consider this loop:
var current = 0
while(current < 100){
current = current + 1
}
I want to be able to get value of current on any loop iteration.
A solution with an Actor would be:
class UpdatingActor extends Actor{
var current : Int = 0
def receive = {
case Update => {
current = current + 1
if (current < 100) self ! Update
}
case Query => sender ! current
}
}
You could get rid of the var using become or FSM, but this example is more clear IMO.
Alternatively, one actor could run the operation and send updated results on every loop iteration to another actor, whose sole responsibility is updating the value and responding to queries about it. I don't know much about "agents" in Akka, but this seems like a potential use case for one.
What are better/alternative ways of doing this using Scala? I don't need to use actors; that was just one solution that came to mind.
Your actor-based solution is ok.
Sending the intermediate result after each change to a "result provider" actor would be a good idea as well if the calculation blocks the actor for a long time and you want to make sure that you can always get the intermediate result. Another alternative would be to make the actual calculator actor a child of the actor that collects the best result. That way the thing acts as a single actor from the outside, and you have the actor that has state (the current best result) separated from the actor that does the computation, which might fail.
An agent would be a solution somewhat between the very low level #volatile/AtomicInteger approach and an Actor. An agent is something that can only be modified by running a transform on it (and there is a queue for transforms), but which has a current state that can always be accessed. It is not location transparent though. so stay with the actor approach if you need that.
Here is how you would solve this with an agent. You have one thread which does a long-running calculation (simulated by Thread.sleep) and another thread that just prints out the best current result in regular intervals (also simulated by Thread.sleep).
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.concurrent._
import akka.agent.Agent
object Main extends App {
val agent = Agent(0)
def computation() : Unit = {
for(i<-0 until 100) {
agent.send { current =>
Thread.sleep(1000) // to simulate a long-running computation
current + 1
}
}
}
def watch() : Unit = {
while(true) {
println("Current value is " + agent.get)
Thread.sleep(1000)
}
}
global.execute(new Runnable {
def run() = computation
})
watch()
}
But all in all I think an actor-based solution would be superior. For example you could do the calculation on a different machine than the result tracking.
The scope of the question is a little wide, but I'll try :)
First, your example is perfectly fine, I don't see the point of getting rid of the var. This is what actors are for: protect mutable state.
Second, based on what you describe you don't need an actor at all.
class UpdatingActor {
private var current = 0
def startCrazyJob() {
while(current < 100){
current = current + 1
}
}
def soWhatsGoingOn: Int = current
}
You just need one thread to call startCrazyJob and a second one that will periodically call soWhatsGoingOn.
IMHO, the actor approach is better, but it's up to you to decide if it's worth importing the akka library just for this use case.

Changing Akka actor state by passing a method with arguments to "become"

I am having some trouble using become in my Akka actor. Basically, my actor has a structure like so:
// This is where I store information received by the actor
// In my real application it has more fields, though.
case class Information(list:List[AnyRef]) {
def received(x:AnyRef) = {
Information(list :+ x)
}
}
class MyActor extends Actor {
// Initial receive block that simply waits for a "start" signal
def receive = {
case Start => {
become(waiting(Information(List())))
}
}
// The main waiting state. In my real application, I have multiple of
// these which all have a parameter of type "Information"
def waiting(info:Information):Receive = {
// If a certain amount of messages was received, I decide what action
// to take next.
if(someCondition) {
decideNextState(x)
}
return {
case Bar(x) => {
//
// !!! Problem occurs here !!!
//
// This is where the problem occurs, apparently. After a decision has been
// made, (i.e. decideNextState was invoked), the info list should've been
// cleared. But when I check the size of the info list here, after a decision
// has been made, it appears to still contain all the messages received
// earlier.
//
become(waiting(info received x))
}
}
}
def decideNextState(info:Information) {
// Some logic, then the received information list is cleared and
// we enter a new state.
become(waiting((Information(List())))
}
}
Sorry for the long code snippet, but I couldn't really make it any smaller.
The part where the problem occurs is marked in the comments. I am passing a parameter to the method that returns the Receive partial function which is then passed to the become method. However, the created partial function seems to somehow preserve state from an earlier invocation. I find the problem a bit difficult to explain, but I did my best to do so in the comments in the code, so please read those and I'll answer anything that is unclear.
Your logic is a little convoluted but I'll take a shot at what could be the problem:
If someCondition is true then your actor steps into a state, let's call it S1 characterized by a value Information(List()). And then you return (by the way, avoid using return unless it is absolutely necessary) a receive method which will put your actor into a state S2 characterized by a list Information(somePreviousList :+ x). So at this point your stack of states has S1 on top. But when you receive a Bar(x) message the state S2 will be pushed, thus covering S1 and you actually transition into a state characterized by an Information with the old values + your new x.
Or something like that, the recursion in your actor is a bit mesmerizing.
But I'll suggest rewriting that code since it seems that the state which changes is something of type Information and you are manipulating this state using Akka's actor state transitions which is not at all the best tool to do that. become and unbecome are meant to be used to transition from different states of the actor's behavior. That is, an actor can have a different behavior at any time and you use become and unbecome to change between these behaviors.
Why not do something like this ?
class MyActor extends Actor {
private var info = Information(List.empty)
def receive = {
case Start => info = Information(List()) //a bit redundant, but it's just to match 1:1 with your code
case Bar(x) => {
if (someCondition) {
info = Information(List.empty)
}
info = info received x
}
}
}
I might not have captured your entire idea, but you get the picture.