Executing More than 3 Futures does not work - scala

I m using dispatch library in my sbt project. When I initialize three future and run them it is working perfectly But I increase one more future then it goes to a loop.
My code:
//Initializing Futures
def sequenceOfFutures() ={
var pageNumber: Int = 1
var list ={Seq(Future{})}
for (pageNumber <- 1 to 4) {
list ++= {
Seq(
Future {
str= getRequestFunction(pageNumber);
GlobalObjects.sleep(Random.nextInt(1500));
}
)
}
}
Future.sequence(list)
}
Await.result(sequenceOfFutures, Duration.Inf)
And then getRequestionFunction(pageNumber) code:
def getRequestionFunction(pageNumber)={
val h=Http("scala.org", as_str)
while(h.isComplete){
Thread,sleep(1500);
}
}
I tried based on one suggestion from How to configure a fine tuned thread pool for futures?
I added this to my code:
import java.util.concurrent.Executors
import scala.concurrent._
implicit val ec = new ExecutionContext {
val threadPool = Executors.newFixedThreadPool(1000);
def execute(runnable: Runnable) {
threadPool.submit(runnable)
}
def reportFailure(t: Throwable) {}
}// Still didn't work
So when I use more than four Futures then it keeps await forever. Is there some solution to fix it?
But it didn't work Could someone please suggest how to solve this issue?

Related

In akka streaming program w/ Source.queue & Sink.queue I offer 1000 items, but it just hangs when I try to get 'em out

I am trying to understand how i should be working with Source.queue & Sink.queue in Akka streaming.
In the little test program that I wrote below I find that I am able to successfully offer 1000 items to the Source.queue.
However, when i wait on the future that should give me the results of pulling all those items off the queue, my
future never completes. Specifically, the message 'print what we pulled off the queue' that we should see at the end
never prints out -- instead we see the error "TimeoutException: Futures timed out after [10 seconds]"
any guidance greatly appreciated !
import akka.actor.ActorSystem
import akka.event.{Logging, LoggingAdapter}
import akka.stream.scaladsl.{Flow, Keep, Sink, Source}
import akka.stream.{ActorMaterializer, Attributes}
import org.scalatest.FunSuite
import scala.collection.immutable
import scala.concurrent.duration._
import scala.concurrent.{Await, ExecutionContext, Future}
class StreamSpec extends FunSuite {
implicit val actorSystem: ActorSystem = ActorSystem()
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val log: LoggingAdapter = Logging(actorSystem.eventStream, "basis-test")
implicit val ec: ExecutionContext = actorSystem.dispatcher
case class Req(name: String)
case class Response(
httpVersion: String = "",
method: String = "",
url: String = "",
headers: Map[String, String] = Map())
test("put items on queue then take them off") {
val source = Source.queue[String](128, akka.stream.OverflowStrategy.backpressure)
val flow = Flow[String].map(element => s"Modified $element")
val sink = Sink.queue[String]().withAttributes( Attributes.inputBuffer(128, 128))
val (sourceQueue, sinkQueue) = source.via(flow).toMat(sink)(Keep.both).run()
(1 to 1000).map( i =>
Future {
println("offerd" + i) // I see this print 1000 times as expected
sourceQueue.offer(s"batch-$i")
}
)
println("DONE OFFER FUTURE FIRING")
// Now use the Sink.queue to pull the items we added onto the Source.queue
val seqOfFutures: immutable.Seq[Future[Option[String]]] =
(1 to 1000).map{ i => sinkQueue.pull() }
val futureOfSeq: Future[immutable.Seq[Option[String]]] =
Future.sequence(seqOfFutures)
val seq: immutable.Seq[Option[String]] =
Await.result(futureOfSeq, 10.second)
// unfortunately our future times out here
println("print what we pulled off the queue:" + seq);
}
}
Looking at this again, I realize that I originally set up and posed my question incorrectly.
The test that accompanies my original question launches a wave
of 1000 futures, each of which tries to offer 1 item to the queue.
Then the second step in that test attempts create a 1000-element sequence (seqOfFutures)
where each future is trying to pull a value from the queue.
My theory as to why I was getting time-out errors is that there was some kind of deadlock due to running
out of threads or due to one thread waiting on another but where the waited-on-thread was blocked,
or something like that.
I'm not interested in hunting down the exact cause at this point because I have corrected
things in the code below (see CORRECTED CODE).
In the new code the test that uses the queue is called:
"put items on queue then take them off (with async parallelism) - (3)".
In this test I have a set of 10 tasks which run in parallel to do the 'enequeue' operation.
Then I have another 10 tasks which do the dequeue operation, which involves not only taking
the item off the list, but also calling stringModifyFunc which introduces a 1 ms processing delay.
I also wanted to prove that I got some performance benefit from
launching tasks in parallel and having the task steps communicate by passing their results through a
queue, so test 3 runs as a timed operation, and I found that it takes 1.9 seconds.
Tests (1) and (2) do the same amount of work, but serially -- The first with no intervening queue, and the second
using the queue to pass results between steps. These tests run in 13.6 and 15.6 seconds respectively
(which shows that the queue adds a bit of overhead, but that this is overshadowed by the efficiencies of running tasks in parallel.)
CORRECTED CODE
import akka.{Done, NotUsed}
import akka.actor.ActorSystem
import akka.event.{Logging, LoggingAdapter}
import akka.stream.scaladsl.{Flow, Keep, Sink, Source}
import akka.stream.{ActorMaterializer, Attributes, QueueOfferResult}
import org.scalatest.FunSuite
import scala.concurrent.duration._
import scala.concurrent.{Await, ExecutionContext, Future}
class Speco extends FunSuite {
implicit val actorSystem: ActorSystem = ActorSystem()
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val log: LoggingAdapter = Logging(actorSystem.eventStream, "basis-test")
implicit val ec: ExecutionContext = actorSystem.dispatcher
val stringModifyFunc: String => String = element => {
Thread.sleep(1)
s"Modified $element"
}
def setup = {
val source = Source.queue[String](128, akka.stream.OverflowStrategy.backpressure)
val sink = Sink.queue[String]().withAttributes(Attributes.inputBuffer(128, 128))
val (sourceQueue, sinkQueue) = source.toMat(sink)(Keep.both).run()
val offers: Source[String, NotUsed] = Source(
(1 to iterations).map { i =>
s"item-$i"
}
)
(sourceQueue,sinkQueue,offers)
}
val outer = 10
val inner = 1000
val iterations = outer * inner
def timedOperation[T](block : => T) = {
val t0 = System.nanoTime()
val result: T = block // call-by-name
val t1 = System.nanoTime()
println("Elapsed time: " + (t1 - t0) / (1000 * 1000) + " milliseconds")
result
}
test("20k iterations in single threaded loop no queue (1)") {
timedOperation{
(1 to iterations).foreach { i =>
val str = stringModifyFunc(s"tag-${i.toString}")
System.out.println("str:" + str);
}
}
}
test("20k iterations in single threaded loop with queue (2)") {
timedOperation{
val (sourceQueue, sinkQueue, offers) = setup
val resultFuture: Future[Done] = offers.runForeach{ str =>
val itemFuture = for {
_ <- sourceQueue.offer(str)
item <- sinkQueue.pull()
} yield (stringModifyFunc(item.getOrElse("failed")) )
val item = Await.result(itemFuture, 10.second)
System.out.println("item:" + item);
}
val result = Await.result(resultFuture, 20.second)
System.out.println("result:" + result);
}
}
test("put items on queue then take them off (with async parallelism) - (3)") {
timedOperation{
val (sourceQueue, sinkQueue, offers) = setup
def enqueue(str: String) = sourceQueue.offer(str)
def dequeue = {
sinkQueue.pull().map{
maybeStr =>
val str = stringModifyFunc( maybeStr.getOrElse("failed2"))
println(s"dequeud value is $str")
}
}
val offerResults: Source[QueueOfferResult, NotUsed] =
offers.mapAsyncUnordered(10){ string => enqueue(string)}
val dequeueResults: Source[Unit, NotUsed] = offerResults.mapAsyncUnordered(10){ _ => dequeue }
val runAll: Future[Done] = dequeueResults.runForeach(u => u)
Await.result(runAll, 20.second)
}
}
}

Saving two RDDs in parallel

accessLogs.saveAsTextFile(outputDirectory1)
accessList.saveAsTextFile(outputDirectory2)
How to save both the RDD in parallel rather than in series?
import scala.concurrent._
import scala.concurrent.duration._
val rdds = Seq(accessLogs, accessLists)
val dirs = Seq("outputDirectory1", "outputDirectory2")
import ExecutionContext.Implicits.global
val future = Future.sequence(
for ((rdd, dir) <- rdds zip dirs) yield Future(rdd.saveAsTextFile(dir))
)
//Await.ready(future, Duration.Inf) //to wait for rdds to be saved...
Note that despite the name, the method sequence on the Future companion object used above will execute the Futures resulting from the for-comprehension in parallel and not sequentially. This sequence method is essentially an applicative functor sequence.
You can save them in threads.
new Thread() {
override def run(): Unit = {
accessLogs.saveAsTextFile(outputDirectory1)
}
}.start()
new Thread() {
override def run(): Unit = {
accessList.saveAsTextFile(outputDirectory2)
}
}.start()
saveAsTextFile doesn't return anything, so I am not sure why are you setting the return value.

Scala: Parallel execution with ListBuffer appends doesn't produce expected outcome

I know I'm doing something wrong with mutable.ListBuffer but I can't figure out how to fix it (and a proper explanation of the issue).
I simplified the code below to reproduce the behavior.
I'm basically trying to run functions in parallel to add elements to a list as my first list get processed. I end up "losing" elements.
import java.util.Properties
import scala.collection.mutable.ListBuffer
import scala.concurrent.duration.Duration
import scala.concurrent.{Await, Future}
import scala.concurrent.{ExecutionContext}
import ExecutionContext.Implicits.global
object MyTestObject {
var listBufferOfInts = new ListBuffer[Int]() // files that are processed
def runFunction(): Int = {
listBufferOfInts = new ListBuffer[Int]()
val inputListOfInts = 1 to 1000
val fut = Future.traverse(inputListOfInts) { i =>
Future {
appendElem(i)
}
}
Await.ready(fut, Duration.Inf)
listBufferOfInts.length
}
def appendElem(elem: Int): Unit = {
listBufferOfInts ++= List(elem)
}
}
MyTestObject.runFunction()
MyTestObject.runFunction()
MyTestObject.runFunction()
which returns:
res0: Int = 937
res1: Int = 992
res2: Int = 997
Obviously I would expect 1000 to be returned all the time. How can I fix my code to keep the "architecture" but make my ListBuffer "synchronized" ?
I don't know what exact problem is as you said you simplified it, but still you have an obvious race condition, multiple threads modify a single mutable collection and that is very bad. As other answers pointed out you need some locking so that only one thread could modify collection at the same time. If your calculations are heavy, appending result in synchronized way to a buffer shouldn't notably affect the performance but when in doubt always measure.
But synchronization is not needed, you can do something else instead, without vars and mutable state. Let each Future return your partial result and then merge them into a list, in fact Future.traverse does just that.
import scala.concurrent.duration._
import scala.concurrent.{Await, Future}
import scala.concurrent.ExecutionContext.Implicits.global
def runFunction: Int = {
val inputListOfInts = 1 to 1000
val fut: Future[List[Int]] = Future.traverse(inputListOfInts.toList) { i =>
Future {
// some heavy calculations on i
i * 4
}
}
val listOfInts = Await.result(fut, Duration.Inf)
listOfInts.size
}
Future.traverse already gives you an immutable list with all your results combined, no need to append them to a mutable buffer.
Needless to say, you will always get 1000 back.
# List.fill(10000)(runFunction).exists(_ != 1000)
res18: Boolean = false
I'm not sure the above shows what you are trying to do correctly. Maybe the issue is that you are actually sharing a var ListBuffer which you reinitialise within runFunction.
When I take this out I collect all the events I'm expecting correctly:
import java.util.Properties
import scala.collection.mutable.ListBuffer
import scala.concurrent.duration.Duration
import scala.concurrent.{ Await, Future }
import scala.concurrent.{ ExecutionContext }
import ExecutionContext.Implicits.global
object BrokenTestObject extends App {
var listBufferOfInts = ( new ListBuffer[Int]() )
def runFunction(): Int = {
val inputListOfInts = 1 to 1000
val fut = Future.traverse(inputListOfInts) { i =>
Future {
appendElem(i)
}
}
Await.ready(fut, Duration.Inf)
listBufferOfInts.length
}
def appendElem(elem: Int): Unit = {
listBufferOfInts.append( elem )
}
BrokenTestObject.runFunction()
BrokenTestObject.runFunction()
BrokenTestObject.runFunction()
println(s"collected ${listBufferOfInts.length} elements")
}
If you really have a synchronisation issue you can use something like the following:
import java.util.Properties
import scala.collection.mutable.ListBuffer
import scala.concurrent.duration.Duration
import scala.concurrent.{ Await, Future }
import scala.concurrent.{ ExecutionContext }
import ExecutionContext.Implicits.global
class WrappedListBuffer(val lb: ListBuffer[Int]) {
def append(i: Int) {
this.synchronized {
lb.append(i)
}
}
}
object MyTestObject extends App {
var listBufferOfInts = new WrappedListBuffer( new ListBuffer[Int]() )
def runFunction(): Int = {
val inputListOfInts = 1 to 1000
val fut = Future.traverse(inputListOfInts) { i =>
Future {
appendElem(i)
}
}
Await.ready(fut, Duration.Inf)
listBufferOfInts.lb.length
}
def appendElem(elem: Int): Unit = {
listBufferOfInts.append( elem )
}
MyTestObject.runFunction()
MyTestObject.runFunction()
MyTestObject.runFunction()
println(s"collected ${listBufferOfInts.lb.size} elements")
}
Changing
listBufferOfInts ++= List(elem)
to
synchronized {
listBufferOfInts ++= List(elem)
}
Make it work. Probably can become a performance issue? I'm still interested in an explanation and maybe a better way of doing things!

Scheduled function not being called

Scheduling a function to be run every X seconds using Scala, this is what works for me :
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext
import ExecutionContext.Implicits.global
object Driver {
def main(args: Array[String]) {
val system = akka.actor.ActorSystem("system")
system.scheduler.schedule(0 seconds, 1 seconds)(println("beep"))
}
}
This implementation :
object Driver {
def main(args: Array[String]) {
val t = new java.util.Timer()
val task = new java.util.TimerTask {
def run() = println("Beep!")
}
t.schedule(task, 1000L, 1000L)
task.cancel()
}
}
And
import java.util.concurrent._
object Driver {
def main(args: Array[String]) {
val ex = new ScheduledThreadPoolExecutor(1)
val task = new Runnable {
def run() = println("Beep!")
}
val f = ex.scheduleAtFixedRate(task, 1, 1, TimeUnit.SECONDS)
f.cancel(false)
}
}
Do not run. They just hang and no output is displayed. What could be causing this ? If I debug the code then it appears to run sometimes, so this is environment related issue ?
I think that it will better not use interface Runnable and use only akka with scala for your job. You must use Cancellable for scheduling task an easy and correct way using actors is like this
From the akka documentation follow like this
import akka.actor.Actor
import akka.actor.Props
import akka.util.duration._
//Schedules to send the "foo"-message to the testActor after 50ms
system.scheduler.scheduleOnce(50 milliseconds, testActor, "foo")
//Schedules a function to be executed (send the current time) to the testActor after 50ms
system.scheduler.scheduleOnce(50 milliseconds) {
testActor ! System.currentTimeMillis
}
val Tick = "tick"
val tickActor = system.actorOf(Props(new Actor {
def receive = {
case Tick ⇒ //write here the function you want to execute
}
}))
//This will schedule to send the Tick-message
//to the tickActor after 0ms repeating every 50ms
val cancellable =
system.scheduler.schedule(0 milliseconds,
50 milliseconds,
tickActor,
Tick)
//This cancels further Ticks to be sent
cancellable.cancel()
this is a complete example that works:
using scala 2.11.6 and akka 2.3.8
package org.example
import akka.actor.{ ActorSystem, Props, Actor }
import scala.concurrent.duration._
import scala.language.postfixOps
/**
* Created by anquegi on 10/04/15.
*/
object ScheduledTaskScala extends App {
//Use the system's dispatcher as ExecutionContext
import system.dispatcher
val system = ActorSystem.create("sheduledtask");
val Tick = "tick"
val tickActor = system.actorOf(Props(new Actor {
def receive = {
case Tick ⇒ { Thread.sleep(1000); println("I'm executing a task"); }
}
}))
//This will schedule to send the Tick-message
//to the tickActor after 0ms repeating every 2 s
val cancellable =
system.scheduler.schedule(0 milliseconds,
2 seconds,
tickActor,
Tick)
Thread.sleep(10000)
cancellable.cancel()
}
The java.util.Timer's scheduler thread does not run as a daemon thread, so it may keep the application from terminating. So you should either
call t.cancel(), or
create your timer with isDaemon set true:
new java.util.Timer(true)
As for the second example, it's basically the same underlying problem, you should call ex.shutdown() to keep your application hanging.

Flaky onSuccess of Future.sequence

I wrote this method:
import scala.concurrent._
import ExecutionContext.Implicits.global
import scala.util.{ Success, Failure }
object FuturesSequence extends App {
val f1 = future {
1
}
val f2 = future {
2
}
val lf = List(f1, f2)
val seq = Future.sequence(lf)
seq.onSuccess {
case l => println(l)
}
}
I was expecting Future.sequence to gather a List[Future] into a Future[List] and then wait for every futures (f1 and f2 in my case) to complete before calling onSuccess on the Future[List] seq in my case.
But after many runs of this code, it prints "List(1, 2)" only once in a while and I can't figure out why it does not work as expected.
Try this for once,
import scala.concurrent._
import java.util.concurrent.Executors
import scala.util.{ Success, Failure }
object FuturesSequence extends App {
implicit val exec = ExecutionContext.fromExecutor(Executors.newCachedThreadPool)
val f1 = future {
1
}
val f2 = future {
2
}
val lf = List(f1, f2)
val seq = Future.sequence(lf)
seq.onSuccess {
case l => println(l)
}
}
This will always print List(1,2). The reason is simple, the exec above is an ExecutionContext of threads (not daemon threads) where as in your example the ExecutionContext was the default one implicitly taken from ExecutionContext.Implicits.global which contains daemon threads.
Hence being daemon, the process doesn't wait for seq future to be completed and terminates. if at all seq does get completed then it prints. But that doesn't happen always
The application is exiting before the future is completes.
You need to block until the future has completed. This can be achieved in a variety of ways, including changing the ExecutionContext, instantiating a new ThreadPool, Thread.sleep etc, or by using methods on scala.concurrent.Await
The simplest way for your code is by using Await.ready. This blocks on a future for a specified amount of time. In the modified code below, the application waits for 5 seconds before exiting.
Note also, the extra import scala.concurrent.duration so we can specify the time to wait.
import scala.concurrent._
import scala.concurrent.duration._
import java.util.concurrent.Executors
import scala.util.{ Success, Failure }
object FuturesSequence extends App {
val f1 = future {
1
}
val f2 = future {
2
}
val lf = List(f1, f2)
val seq = Future.sequence(lf)
seq.onSuccess {
case l => println(l)
}
Await.ready(seq, 5 seconds)
}
By using Await.result instead, you can skip the onSuccess method too, as it will return the resulting list to you.
Example:
val seq: List[Int] = Await.result(Future.sequence(lf), 5 seconds)
println(seq)