Scala futures and JMM - scala

I have a question about JMM and Scala futures.
In the following code, I have non-immutable Data class. I create an instance of it inside one thread(inside Future apply body), and then subscribe on completion event.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
object Hello extends App {
Future {
new Data(1, "2")
}.foreach { d =>
println(d)
}
Thread.sleep(100000)
}
class Data(var someInt: Int, var someString: String)
Can we guarantee that:
foreach body called from the same thread, where a Data instance was created?
If not, can we guarantee that actions inside the Future.apply happens-before(in terms of JMM) actions inside foreach body?

Completion happens-before callback execution.
Disclaimer: I am the main contributor.

I had a sort-of similar question, and what I found is -
1) in the doc Intellij so conveniently pulled up for me it says
Asynchronously processes the value in the future once the value becomes available...
2) on https://docs.scala-lang.org/overviews/core/futures.html it says
The result becomes available once the future completes.
Basically, it does not anywhere I can find say explicitly that there is a memory barrier. I suspect, however, that it is a safe assumption that there is. Otherwise the language would simply not work.

No.
You can get a good idea of this by looking through the source code for Promise/DefaultPromise/Future, which schedules the callback for foreach on the execution context/adds it to the listeners without any special logic requiring it to run on the original thread...
But you can also verify it experimentally, by trying to set up an execution context and threads such that something else will already be queued for execution when the Future in which Data was created completes.
implicit val context = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(2))
Future {
new Data(1, "2")
println("Data created on: " + Thread.currentThread().getName)
Thread.sleep(100)
}.foreach { _ =>
println("Data completed on: " + Thread.currentThread().getName)
}
Future { // occupies second thread
Thread.sleep(1000)
}
Future { // queue for execution while first future is still executing
Thread.sleep(2000)
}
My output:
Data created on: pool-$n-thread-1
Data completed on: pool-$n-thread-2
2.
Less confident here than I'd like to be, but I'll give it a shot:
Yes.
DefaultPromise, the construct underlying Future, is wrapping an atomic reference, which behaves like a volatile variable. Since the write-to for updating the result must happen prior to the read-from which passes the result to the listener so it can run the callback, JMM volatile variable rules turn this into a happens-before relationship.

I don't think there are any guarantees that foreach is called from the same thread
foreach will not be called until the future completes succesfully. onComplete is a more idiomatic way of providing a callback to process the result of a Future.

Related

java.lang.IllegalMonitorStateException when testing scala future

I have the following test class:
#RunWith(classOf[JUnitRunner])
class NodeScalaSuite extends FunSuite with ScalaFutures {
Within it, I added this test to check a method returning a future:
test("Any should return the first future") {
val p = Promise[Int]()
p completeWith Future.any(List(Future{wait(2000); 1}, Future{wait(1); 2}))
whenReady(p.future) {x =>
assert(true)
}
}
(I made the assert true just for simpler debugging.)
When I run the test suite I am getting this error:
[info] The future returned an exception of type: java.lang.IllegalMonitorStateException.
What could cause this?
According to docs for java.lang.Object#wait it
throws IllegalMonitorStateException if the current thread is not the owner of the object's monitor.
Which means wait should be called inside synchronized block. Something like synchronized { wait(2000) } should work but I think what you really want to do is to use Thread.sleep(2000). wait is meant to be used in combination with notify and notifyAll for synchronizing access to shared resource from multiple threads. It releases the object's monitor so another thread can execute the same synchronized block.

Activiti Java Service Task: Passivate w/out the need for receive task

this has already been answered but the solutions have not been working out for me.
Activiti asynchronous behaviour is fairly simple and only allows the user to enable a flag which tells activiti engine to insert such task in a execution queue (managing a pool of threads).
What i want is not to insert my java service task in a pool but to passivate its behaviour and only complete such task when an external signal is received and/or a callback is called.
My attempt:
class customAsyncTask extends TaskActivityBehavior {
override def execute(execution: ActivityExecution): Unit = {
val future = Future {
println(s"Executing customAsyncTask -> ${execution.getCurrentActivityName}, ${cur}")
}
future.onComplete {
case Success(result) => leave(execution)
case _ => // whatever
}
}
def signal(processInstanceId : String, transition : String) = {
val commandExecutor = main.processEngine.getProcessEngineConfiguration.asInstanceOf[ProcessEngineConfigurationImpl].getCommandExecutor
val command = new customSignal(processInstanceId, transition)
commandExecutor.execute(command)
}
}
On my previous code sample i have registered a scala future callback which when called will terminate the current activity and move to the next.
I also have a signal method which builds a custom signal that based on the processId and a name will call execution.take with the appropriate transition.
On both cases i am getting the following error (the bottom stack changes a little)
java.lang.NullPointerException
at org.activiti.engine.impl.persistence.entity.ExecutionEntity.performOperationSync(ExecutionEntity.java:636)
at org.activiti.engine.impl.persistence.entity.ExecutionEntity.performOperation(ExecutionEntity.java:629)
at org.activiti.engine.impl.persistence.entity.ExecutionEntity.take(ExecutionEntity.java:453)
at org.activiti.engine.impl.persistence.entity.ExecutionEntity.take(ExecutionEntity.java:431)
at org.activiti.engine.impl.bpmn.behavior.BpmnActivityBehavior.performOutgoingBehavior(BpmnActivityBehavior.java:140)
at org.activiti.engine.impl.bpmn.behavior.BpmnActivityBehavior.performDefaultOutgoingBehavior(BpmnActivityBehavior.java:66)
at org.activiti.engine.impl.bpmn.behavior.FlowNodeActivityBehavior.leave(FlowNodeActivityBehavior.java:44)
at org.activiti.engine.impl.bpmn.behavior.AbstractBpmnActivityBehavior.leave(AbstractBpmnActivityBehavior.java:47)
Unfortunately, it is highly likely that the engine is erasing the information concerning the execution when the execute method returns, even though no complete/leave/take has been called. Even though my callback has the execution object in context, when i query for information using its proccess ID all i receive is null.
So, what i am doing wrong here? How can i achieve the behaviour that i want?
I dont see anything specific, I would have said you need to extend a class that implements SignalableActivityBehavior, but I think TaskActivityBehavior actually does this.
While the stack indicates the NPE is coming from the leave(), I am confused why leave is calling "take" since take is a transition event and really should only happen on a task labeled as synchronous.
All I can offer is, Camunda have an example implementation that is similar to your scenario. You may be able to use this to help you:
https://github.com/camunda/camunda-bpm-examples/tree/master/servicetask/service-invocation-asynchronous
It seems that activiti uses thread local variables which means that when calling methods from the scala threads (scala Executor Context) would be pointless since they do not share the context.
To solve all i have to do from my callback is make a signal call much like if i were calling from a remote system. The only difference is that i do not need to save my process instance identifier.
The code looks as such:
class AsynchronousServiceTask extends AbstractBpmnActivityBehavior {
val exec_id : String = "executionId"
override def execute(execution : ActivityExecution) = {
val future = Future { println("Something") }
future onComplete {
case _ => myobject.callSignalForMe(execution.getId)
}
}
override def signal(execution : ActivityExecution, signalName : String, signalData : AnyRef) = {
println("Signal called, leaving current activity..")
leave(execution)
}
}
Basically, myobject holds the runTimeEngine and will inject the signal in a ThreadLocal context. All clean and working as intended.

If you don't block on a future, can it cause a race like condition?

Say I have a controller that has a future call, and then I redirect to another page which expects that call to have computed. Is it possible the future doesn't return fast enough and when I redirect to the other page the data will not be "fresh"?
Example:
class HomeController {
def action1 = Action{
val f1 = Future { ... }
Redirect(routes.HomeController.action2)
}
def action2 = Action {
// expects the call to f1:Future to have executed
}
}
The reason I am asking is should my Service layer return Future's or block? Or should I pass the Future down to the calling code like in my controller.
UserService
save
delete
update
findById
etc. etc..
Should these return Future?
If I expected the call to have computed, I have to block. So is there no hard and fast rule for this?
There's a couple of things in your question that are somewhat tangential, so I'll address them in turn
1) Will the redirect response return before the future returns?
Impossible to tell and, further, it will not behave consistently. For the redirect to not return before the Future has completed its computation, you should use an async action instead, so that the call doesn't respond with the redirect until the Future completes:
def action1 = Action.async { request =>
futureComputation().map { _ => Redirect(routes.HomeController.action2) }
}
2) Should my Service layer return a Future?
The simple answer is yes, if the underlying API is also non-blocking.
The more nuanced answer is yes, even if your service call is blocking, but every other call that is used in your workflow returns a future. This second qualification is about composition and readability.
Suppose the workflow is:
findById
fetchHttpResource (async get)
update (with data from fetch
Because there is one async component, the workflow should be async and you might write it like this:
def workflow(id:UUID):Future[Unit] = {
val user = UserService.findById(id)
fetchHttpResource(id).map { resource =>
val updatedUser = user.copy(x = resource)
UserService.update(updatedUser)
}
}
If UserService returns Future's as well, you could use a for-comprehension instead:
def workflow(id: UUID): Future[Unit] = for {
user <- UserService.findById(id)
resource <- fetchHttpResource(id)
_ <- UserService.update(user.copy(x = resource))
} yield {}
which brings me to
3) Is there a hard and fast rule for using Future's?
Async code is a turtles all the way down affair. Once you have one async call in your chain, your chain has to return a Future and once you have that requirement, composition via for, map, flatMap, etc. and error handling becomes a lot cleaner among Future signatures. The additional benefit is that if you have a blocking call now, maybe in the future you can find an async API to use inside that service, your signature doesn't have to change.
For returning a Future from blocking calls that may fail the best policy is to wrap them with future {}. If however, you have computed data that cannot fail, using Future.successful() is the better choice.
As your code stands your action1 will kick off the code executed in the future then send the redirect to the browser. The is no guarantee that the code in the future will be done before action2 is called.
Howver, assuming this is play framework, you can wait for the code in the future to complete without blocking the current thread, by using the map or onComplete methods on Future to register callbacks, in which you'd complete the request by sending the redirect. You'll need to change your Action to Action.async.

Racy tail recursive function

I am trying to do some processing on a SynchronizedQueue using a tail recursive function. The function seems to work properly but the more I think about concurrency the more I believe I could have some race conditions when accessing this queue with different threads. Here is the function that I think I could use some help with:
val unsavedMessages = new SynchronizedQueue[CachedMessage]()
val MAX_BATCH = 256
val rowCount = new AtomicInteger()
private def validateCacheSize() = if (unsavedMessages.length > MAX_BATCH) {
implicit val batch = createBatch
val counter = rowCount.getAndIncrement
#tailrec
def processQueue(queue: SynchronizedQueue[CachedMessage]): Unit = if (queue.nonEmpty) {
val cm = queue.dequeue
addToBatch(cm.request, cm.timestamp, cm.brokerId, counter)
processQueue(queue)
}
processQueue(unsavedMessages)
executeBatch
resetQueue
}
def resetQueue = unsavedMessages.clear
Multiple threads call this function:
def add(request: WebserviceRuleMatch, timestamp: Long, brokerId: String) = {
validateCacheSize
//log.info("enquing request "+ unsavedMessages.length)
unsavedMessages.enqueue(CachedMessage(request, timestamp, brokerId))
}
Does anyone have any pointers on how to improve this so there would likely not be a race condition?
there could be a chance that the queue gets emptied between queue.nonempty and queue.dequeue
Avoid calling multiple queue operations that must be synchronized within your code. Use the power of SynchronizedQueue to do atomic thread-safe operations. E.g. avoid calling queue.nonempty altogether (alternative to tail-recursion):
for (cm <- unsavedMessages.dequeueAll(_ => true))
addToBatch(cm.request, cm.timestamp, cm.brokerId, counter)
executeBatch
//resetQueue -- Don't do this! Not thread-safe
I think messages could be added by a thread between processQueue and resetQueue
There will always be a point at which your code has taken a 'snapshot' of the queue and emptied it. My previous point ensured that the 'snapshot' and emptying are a single atomic operation. If new entries are enqueued at any point after that atomic 'snapshot & empty' operation - no problem. Your 'snapshot & empty' must occur somewhere and new items enqueued are a fact of life. Make the decision to allow new items to be enqueued at any point subsequent to the 'snapshot & empty'. They'll be processed on next cycle. i.e. nothing extra needed beyond above point.
Robin Green: (By the way, that method seems to have a very misleading name!)
Wot he said! :)
The add function gets gets called from a future so I feel as though there could be a chance that the queue gets emptied between queue.nonempty and queue.dequeue.
Yes, it could. You could use double-checked locking to make validateCacheSize single-threaded. (By the way, that method seems to have a very misleading name!)
Also I think messages could be added by a thread between processQueue and resetQueue.
Yes, they could. But why do you need to call unsavedMessages.clear at all? queue.dequeue already removes them from the queue. So the only unsavedMessages that should exist in the queue then are ones that still remain to be processed.

Querying a continously running operation for its current state/value in Scala

I have a procedure that continuously updates a value. I want to be able to periodically query the operation for the current value. In my particular example, every update can be considered an improvement and the procedure will eventually converge on a final, best answer, but I want/need access to the intermediate results. The speed with which the loop executes and the time it takes to converge matters.
As an example, consider this loop:
var current = 0
while(current < 100){
current = current + 1
}
I want to be able to get value of current on any loop iteration.
A solution with an Actor would be:
class UpdatingActor extends Actor{
var current : Int = 0
def receive = {
case Update => {
current = current + 1
if (current < 100) self ! Update
}
case Query => sender ! current
}
}
You could get rid of the var using become or FSM, but this example is more clear IMO.
Alternatively, one actor could run the operation and send updated results on every loop iteration to another actor, whose sole responsibility is updating the value and responding to queries about it. I don't know much about "agents" in Akka, but this seems like a potential use case for one.
What are better/alternative ways of doing this using Scala? I don't need to use actors; that was just one solution that came to mind.
Your actor-based solution is ok.
Sending the intermediate result after each change to a "result provider" actor would be a good idea as well if the calculation blocks the actor for a long time and you want to make sure that you can always get the intermediate result. Another alternative would be to make the actual calculator actor a child of the actor that collects the best result. That way the thing acts as a single actor from the outside, and you have the actor that has state (the current best result) separated from the actor that does the computation, which might fail.
An agent would be a solution somewhat between the very low level #volatile/AtomicInteger approach and an Actor. An agent is something that can only be modified by running a transform on it (and there is a queue for transforms), but which has a current state that can always be accessed. It is not location transparent though. so stay with the actor approach if you need that.
Here is how you would solve this with an agent. You have one thread which does a long-running calculation (simulated by Thread.sleep) and another thread that just prints out the best current result in regular intervals (also simulated by Thread.sleep).
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.concurrent._
import akka.agent.Agent
object Main extends App {
val agent = Agent(0)
def computation() : Unit = {
for(i<-0 until 100) {
agent.send { current =>
Thread.sleep(1000) // to simulate a long-running computation
current + 1
}
}
}
def watch() : Unit = {
while(true) {
println("Current value is " + agent.get)
Thread.sleep(1000)
}
}
global.execute(new Runnable {
def run() = computation
})
watch()
}
But all in all I think an actor-based solution would be superior. For example you could do the calculation on a different machine than the result tracking.
The scope of the question is a little wide, but I'll try :)
First, your example is perfectly fine, I don't see the point of getting rid of the var. This is what actors are for: protect mutable state.
Second, based on what you describe you don't need an actor at all.
class UpdatingActor {
private var current = 0
def startCrazyJob() {
while(current < 100){
current = current + 1
}
}
def soWhatsGoingOn: Int = current
}
You just need one thread to call startCrazyJob and a second one that will periodically call soWhatsGoingOn.
IMHO, the actor approach is better, but it's up to you to decide if it's worth importing the akka library just for this use case.