Iterator.continually vs while(..) in Scala - scala

def task() {
Thread.sleep(60*1000) // update lru cache every minute
// do some compute-intensive task here to populate lru cache
println("LRU Updated!")
}
new Thread {
override def run = while(true) task()
}.start
vs
Iterator.continually(task()).dropWhile(_=>true)
have the exact same behavior. Are they equivalent under the hood as well ?

Iterator.continually is useful to get an iterator that you can then use like a pseudo collection, for non side effecting results. Here you are doing a println, so continually is a nice sugar, but it doesn't really give you anything.
If you had something like:
def task() {
Thread.sleep(60*1000) // update lru cache every minute
// do some compute-intensive task here to populate lru cache
"LRU Updated!"
}
To apply a side effect, you could still use thread, but you wouldn't be able to do much more (without more complex code).
new Thread {
override def run = while(true) println(task())
}.start
If you wanted to transform the output, or push it to another call, or compose it, etc. then an iterator would be a nicer abstraction:
Iterator.continually(task()).map(x => s"$x Yay!").take(100)
But I guess that's not really what you want to do from your example.

Related

How to do a `getOrElseComplete` on `Promise`?

Does it make sense to have an operation like getOrElseComplete that tries to complete a Promise with a value but, if the Promise is already completed, returns the existing completed value instead. Here's a sample implementation:
implicit class PromiseOps[T](promise: Promise[T]) {
def getOrElseComplete(value: Try[T]): Try[T] = {
val didComplete = promise.tryComplete(value)
if (didComplete) {
value
} else {
// The `tryComplete` returned `false`, which means the promise
// was already completed, so the following is safe (?)
promise.future.value.get
}
}
}
Is this safe to do? If not, why not? If so, is there a way to do this directly (eg. without relying on questionable things like _.value.get) or why isn't there such a way in the standard library?
From your comments it seems to me that this is a valid solution for your problem but I also feel that a method like this doesn't belong in Promise API because Promise is supposed to be only a settable "backend" of its Future.
I'd prefer to have a completely separate function which could look like this:
def getOrComplete[T](promise: Promise[T], value: Try[T]): Try[T] =
promise.future.value.getOrElse {
if (promise.tryComplete(value)) value
else getOrComplete(promise, value)
}
The recursive call may seem weird - it serves two purposes:
it protects against a race condition where some other thread completes the future just before we call tryComplete
it avoids usage of .value.get on the Future
You might also want to pass value as a by-name parameter to avoid evaluating it when the Promise is already completed.
This operation does what it promises. It may make more sense to take value by name, and don't try to complete if already completed, maybe something like
def getOrElseComplete(value: => Try[T]): Try[T] = {
if (!promise.completed) {
promise.tryComplete(value)
}
promise.future.value.get
}
It's kinda dodgy though. Sharing a promise and having multiple places where it might be completed sounds like a difficult to maintain design, and one has to ask what's happening with the other path that might still complete the Promise? Shouldn't something be cancelled there?

Scala futures and JMM

I have a question about JMM and Scala futures.
In the following code, I have non-immutable Data class. I create an instance of it inside one thread(inside Future apply body), and then subscribe on completion event.
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
object Hello extends App {
Future {
new Data(1, "2")
}.foreach { d =>
println(d)
}
Thread.sleep(100000)
}
class Data(var someInt: Int, var someString: String)
Can we guarantee that:
foreach body called from the same thread, where a Data instance was created?
If not, can we guarantee that actions inside the Future.apply happens-before(in terms of JMM) actions inside foreach body?
Completion happens-before callback execution.
Disclaimer: I am the main contributor.
I had a sort-of similar question, and what I found is -
1) in the doc Intellij so conveniently pulled up for me it says
Asynchronously processes the value in the future once the value becomes available...
2) on https://docs.scala-lang.org/overviews/core/futures.html it says
The result becomes available once the future completes.
Basically, it does not anywhere I can find say explicitly that there is a memory barrier. I suspect, however, that it is a safe assumption that there is. Otherwise the language would simply not work.
No.
You can get a good idea of this by looking through the source code for Promise/DefaultPromise/Future, which schedules the callback for foreach on the execution context/adds it to the listeners without any special logic requiring it to run on the original thread...
But you can also verify it experimentally, by trying to set up an execution context and threads such that something else will already be queued for execution when the Future in which Data was created completes.
implicit val context = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(2))
Future {
new Data(1, "2")
println("Data created on: " + Thread.currentThread().getName)
Thread.sleep(100)
}.foreach { _ =>
println("Data completed on: " + Thread.currentThread().getName)
}
Future { // occupies second thread
Thread.sleep(1000)
}
Future { // queue for execution while first future is still executing
Thread.sleep(2000)
}
My output:
Data created on: pool-$n-thread-1
Data completed on: pool-$n-thread-2
2.
Less confident here than I'd like to be, but I'll give it a shot:
Yes.
DefaultPromise, the construct underlying Future, is wrapping an atomic reference, which behaves like a volatile variable. Since the write-to for updating the result must happen prior to the read-from which passes the result to the listener so it can run the callback, JMM volatile variable rules turn this into a happens-before relationship.
I don't think there are any guarantees that foreach is called from the same thread
foreach will not be called until the future completes succesfully. onComplete is a more idiomatic way of providing a callback to process the result of a Future.

Activiti Java Service Task: Passivate w/out the need for receive task

this has already been answered but the solutions have not been working out for me.
Activiti asynchronous behaviour is fairly simple and only allows the user to enable a flag which tells activiti engine to insert such task in a execution queue (managing a pool of threads).
What i want is not to insert my java service task in a pool but to passivate its behaviour and only complete such task when an external signal is received and/or a callback is called.
My attempt:
class customAsyncTask extends TaskActivityBehavior {
override def execute(execution: ActivityExecution): Unit = {
val future = Future {
println(s"Executing customAsyncTask -> ${execution.getCurrentActivityName}, ${cur}")
}
future.onComplete {
case Success(result) => leave(execution)
case _ => // whatever
}
}
def signal(processInstanceId : String, transition : String) = {
val commandExecutor = main.processEngine.getProcessEngineConfiguration.asInstanceOf[ProcessEngineConfigurationImpl].getCommandExecutor
val command = new customSignal(processInstanceId, transition)
commandExecutor.execute(command)
}
}
On my previous code sample i have registered a scala future callback which when called will terminate the current activity and move to the next.
I also have a signal method which builds a custom signal that based on the processId and a name will call execution.take with the appropriate transition.
On both cases i am getting the following error (the bottom stack changes a little)
java.lang.NullPointerException
at org.activiti.engine.impl.persistence.entity.ExecutionEntity.performOperationSync(ExecutionEntity.java:636)
at org.activiti.engine.impl.persistence.entity.ExecutionEntity.performOperation(ExecutionEntity.java:629)
at org.activiti.engine.impl.persistence.entity.ExecutionEntity.take(ExecutionEntity.java:453)
at org.activiti.engine.impl.persistence.entity.ExecutionEntity.take(ExecutionEntity.java:431)
at org.activiti.engine.impl.bpmn.behavior.BpmnActivityBehavior.performOutgoingBehavior(BpmnActivityBehavior.java:140)
at org.activiti.engine.impl.bpmn.behavior.BpmnActivityBehavior.performDefaultOutgoingBehavior(BpmnActivityBehavior.java:66)
at org.activiti.engine.impl.bpmn.behavior.FlowNodeActivityBehavior.leave(FlowNodeActivityBehavior.java:44)
at org.activiti.engine.impl.bpmn.behavior.AbstractBpmnActivityBehavior.leave(AbstractBpmnActivityBehavior.java:47)
Unfortunately, it is highly likely that the engine is erasing the information concerning the execution when the execute method returns, even though no complete/leave/take has been called. Even though my callback has the execution object in context, when i query for information using its proccess ID all i receive is null.
So, what i am doing wrong here? How can i achieve the behaviour that i want?
I dont see anything specific, I would have said you need to extend a class that implements SignalableActivityBehavior, but I think TaskActivityBehavior actually does this.
While the stack indicates the NPE is coming from the leave(), I am confused why leave is calling "take" since take is a transition event and really should only happen on a task labeled as synchronous.
All I can offer is, Camunda have an example implementation that is similar to your scenario. You may be able to use this to help you:
https://github.com/camunda/camunda-bpm-examples/tree/master/servicetask/service-invocation-asynchronous
It seems that activiti uses thread local variables which means that when calling methods from the scala threads (scala Executor Context) would be pointless since they do not share the context.
To solve all i have to do from my callback is make a signal call much like if i were calling from a remote system. The only difference is that i do not need to save my process instance identifier.
The code looks as such:
class AsynchronousServiceTask extends AbstractBpmnActivityBehavior {
val exec_id : String = "executionId"
override def execute(execution : ActivityExecution) = {
val future = Future { println("Something") }
future onComplete {
case _ => myobject.callSignalForMe(execution.getId)
}
}
override def signal(execution : ActivityExecution, signalName : String, signalData : AnyRef) = {
println("Signal called, leaving current activity..")
leave(execution)
}
}
Basically, myobject holds the runTimeEngine and will inject the signal in a ThreadLocal context. All clean and working as intended.

Racy tail recursive function

I am trying to do some processing on a SynchronizedQueue using a tail recursive function. The function seems to work properly but the more I think about concurrency the more I believe I could have some race conditions when accessing this queue with different threads. Here is the function that I think I could use some help with:
val unsavedMessages = new SynchronizedQueue[CachedMessage]()
val MAX_BATCH = 256
val rowCount = new AtomicInteger()
private def validateCacheSize() = if (unsavedMessages.length > MAX_BATCH) {
implicit val batch = createBatch
val counter = rowCount.getAndIncrement
#tailrec
def processQueue(queue: SynchronizedQueue[CachedMessage]): Unit = if (queue.nonEmpty) {
val cm = queue.dequeue
addToBatch(cm.request, cm.timestamp, cm.brokerId, counter)
processQueue(queue)
}
processQueue(unsavedMessages)
executeBatch
resetQueue
}
def resetQueue = unsavedMessages.clear
Multiple threads call this function:
def add(request: WebserviceRuleMatch, timestamp: Long, brokerId: String) = {
validateCacheSize
//log.info("enquing request "+ unsavedMessages.length)
unsavedMessages.enqueue(CachedMessage(request, timestamp, brokerId))
}
Does anyone have any pointers on how to improve this so there would likely not be a race condition?
there could be a chance that the queue gets emptied between queue.nonempty and queue.dequeue
Avoid calling multiple queue operations that must be synchronized within your code. Use the power of SynchronizedQueue to do atomic thread-safe operations. E.g. avoid calling queue.nonempty altogether (alternative to tail-recursion):
for (cm <- unsavedMessages.dequeueAll(_ => true))
addToBatch(cm.request, cm.timestamp, cm.brokerId, counter)
executeBatch
//resetQueue -- Don't do this! Not thread-safe
I think messages could be added by a thread between processQueue and resetQueue
There will always be a point at which your code has taken a 'snapshot' of the queue and emptied it. My previous point ensured that the 'snapshot' and emptying are a single atomic operation. If new entries are enqueued at any point after that atomic 'snapshot & empty' operation - no problem. Your 'snapshot & empty' must occur somewhere and new items enqueued are a fact of life. Make the decision to allow new items to be enqueued at any point subsequent to the 'snapshot & empty'. They'll be processed on next cycle. i.e. nothing extra needed beyond above point.
Robin Green: (By the way, that method seems to have a very misleading name!)
Wot he said! :)
The add function gets gets called from a future so I feel as though there could be a chance that the queue gets emptied between queue.nonempty and queue.dequeue.
Yes, it could. You could use double-checked locking to make validateCacheSize single-threaded. (By the way, that method seems to have a very misleading name!)
Also I think messages could be added by a thread between processQueue and resetQueue.
Yes, they could. But why do you need to call unsavedMessages.clear at all? queue.dequeue already removes them from the queue. So the only unsavedMessages that should exist in the queue then are ones that still remain to be processed.

Querying a continously running operation for its current state/value in Scala

I have a procedure that continuously updates a value. I want to be able to periodically query the operation for the current value. In my particular example, every update can be considered an improvement and the procedure will eventually converge on a final, best answer, but I want/need access to the intermediate results. The speed with which the loop executes and the time it takes to converge matters.
As an example, consider this loop:
var current = 0
while(current < 100){
current = current + 1
}
I want to be able to get value of current on any loop iteration.
A solution with an Actor would be:
class UpdatingActor extends Actor{
var current : Int = 0
def receive = {
case Update => {
current = current + 1
if (current < 100) self ! Update
}
case Query => sender ! current
}
}
You could get rid of the var using become or FSM, but this example is more clear IMO.
Alternatively, one actor could run the operation and send updated results on every loop iteration to another actor, whose sole responsibility is updating the value and responding to queries about it. I don't know much about "agents" in Akka, but this seems like a potential use case for one.
What are better/alternative ways of doing this using Scala? I don't need to use actors; that was just one solution that came to mind.
Your actor-based solution is ok.
Sending the intermediate result after each change to a "result provider" actor would be a good idea as well if the calculation blocks the actor for a long time and you want to make sure that you can always get the intermediate result. Another alternative would be to make the actual calculator actor a child of the actor that collects the best result. That way the thing acts as a single actor from the outside, and you have the actor that has state (the current best result) separated from the actor that does the computation, which might fail.
An agent would be a solution somewhat between the very low level #volatile/AtomicInteger approach and an Actor. An agent is something that can only be modified by running a transform on it (and there is a queue for transforms), but which has a current state that can always be accessed. It is not location transparent though. so stay with the actor approach if you need that.
Here is how you would solve this with an agent. You have one thread which does a long-running calculation (simulated by Thread.sleep) and another thread that just prints out the best current result in regular intervals (also simulated by Thread.sleep).
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.duration._
import scala.concurrent._
import akka.agent.Agent
object Main extends App {
val agent = Agent(0)
def computation() : Unit = {
for(i<-0 until 100) {
agent.send { current =>
Thread.sleep(1000) // to simulate a long-running computation
current + 1
}
}
}
def watch() : Unit = {
while(true) {
println("Current value is " + agent.get)
Thread.sleep(1000)
}
}
global.execute(new Runnable {
def run() = computation
})
watch()
}
But all in all I think an actor-based solution would be superior. For example you could do the calculation on a different machine than the result tracking.
The scope of the question is a little wide, but I'll try :)
First, your example is perfectly fine, I don't see the point of getting rid of the var. This is what actors are for: protect mutable state.
Second, based on what you describe you don't need an actor at all.
class UpdatingActor {
private var current = 0
def startCrazyJob() {
while(current < 100){
current = current + 1
}
}
def soWhatsGoingOn: Int = current
}
You just need one thread to call startCrazyJob and a second one that will periodically call soWhatsGoingOn.
IMHO, the actor approach is better, but it's up to you to decide if it's worth importing the akka library just for this use case.