Akka SourceQueue to send list elements - scala

I have a List[String] and a Source.queue. I would like to offer this queue string elements after some interval of time. Something like this :
val data : List[String] = ""
val tick = Source.tick(0 second, 1 second, "tick")
tick.runForeach(t => queue.offer(data(??))
Can someone help me out?
Edit : I have found a way but looking for more elegant way
val tick = Source.tick(0 second, 2 second, "tick").zipWithIndex.limit(data.length)
tick.runForeach(t => {
queue.offer(data(t._2.toInt))
})

To send elements from the List[String] to the queue in specific time intervals for each element, use Source#delay in the following manner:
val data: List[String] = ???
Source(data)
.delay(2.seconds, DelayOverflowStrategy.backpressure)
.withAttributes(Attributes.inputBuffer(1, 1))
.mapAsync(1)(x => queue.offer(x))
.runWith(Sink.ignore)
Set the input buffer size to one with withAttributes because the default value is 16, and use DelayOverflowStrategy.backpressure. Also, use mapAsync since the offer method returns a Future.
Alternatively, use Source#throttle:
Source(data)
.throttle(1, 2.seconds, 1, ThrottleMode.Shaping)
.mapAsync(1)(x => queue.offer(x))
.runWith(Sink.ignore)

Related

For loop containing Scala Futures modifying a List

Let's say I have a ListBuffer[Int] and I iterate it with a foreach loop, and each loop will modify this list from inside a Future (removing the current element), and will do something special when the list is empty. Example code:
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
import scala.collection.mutable.ListBuffer
val l = ListBuffer(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
l.foreach(n => Future {
println(s"Processing $n")
Future {
l -= n
println(s"Removed $n")
if (l.isEmpty) println("List is empty!")
}
})
This is probably going to end very badly. I have a more complex code with similar structure and same needs, but I do not know how to structure it so I can achieve same functionality in a more reliable way.
The way you present your problem is really not in the functional paradigm that scala is intended for.
What you seem to want, is to do a list of asynchronous computations, do something at the end of each one, and something else when every one is finished. This is pretty simple if you use continuations, which are simple to implement with map and flatMap methods on Future.
val fa: Future[Int] = Future { 1 }
// will apply the function to the result when it becomes available
val fb: Future[Int] = fa.map(a => a + 1)
// will start the asynchronous computation using the result when it will become available
val fc: Future[Int] = fa.flatMap(a => Future { a + 2 })
Once you have all this, you can easily do something when each of your Future completes (successfully):
val myFutures: List[Future[Int]] = ???
myFutures.map(futInt => futInt.map(int => int + 2))
Here, I will add 2 to each value I get from the different asynchronous computations in the List.
You can also choose to wait for all the Futures in your list to complete by using Future.sequence:
val myFutureList: Future[List[Int]] = Future.sequence(myFutures)
Once again, you get a Future, which will be resolved when each of the Futures inside the input list are successfully resolved, or will fail whenever one of your Futures fails. You'll then be able to use map or flatMap on this new Future, to use all the computed values at once.
So here's how I would write the code you proposed:
val l = 1 to 10
val processings: Seq[Future[Unit]] = l.map {n =>
Future(println(s"processing $n")).map {_ =>
println(s"finished processing $n")
}
}
val processingOver: Future[Unit] =
Future.sequence(processings).map { (lu: Seq[Unit]) =>
println(s"Finished processing ${lu.size} elements")
}
Of course, I would recommend having real functions rather than procedures (returning Unit), so that you can have values to do something with. I used println to have a code which will produce the same output as yours (except for the prints, which have a slightly different meaning, since we are not mutating anything anymore).

How to log flow rate in Akka Stream?

I have an Akka Stream application with a single flow/graph. I want to measure the flow rate at the source and log it every 5 seconds, like 'received 3 messages in the last 5 seconds'. I tried with,
someOtherFlow
.groupedWithin(Integer.MAX_VALUE, 5 seconds)
.runForeach(seq =>
log.debug(s"received ${seq.length} messages in the last 5 seconds")
)
but it only outputs when there are messages, no empty list when there are 0 messages. I want the 0's as well. Is this possible?
You could try something like
src
.conflateWithSeed(_ ⇒ 1){ case (acc, _) ⇒ acc + 1 }
.zip(Source.tick(5.seconds, 5.seconds, NotUsed))
.map(_._1)
which should batch your elements until the tick releases them. This is inspired from an example in the docs.
On a different note, if you need this for monitoring purposes, you could leverage a 3rd party tool for this purpose - e.g. Kamon.
A sample akka stream logging.
implicit val system: ActorSystem = ActorSystem("StreamLoggingActorSystem")
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val adapter: LoggingAdapter = Logging(system, "customLogger")
implicit val ec: ExecutionContextExecutor = system.dispatcher
def randomInt = Random.nextInt()
val source = Source.repeat(NotUsed).map(_ ⇒ randomInt)
val logger = source
.groupedWithin(Integer.MAX_VALUE, 5.seconds)
.log(s"in the last 5 seconds number of messages received : ", _.size)
.withAttributes(
Attributes.logLevels(
onElement = Logging.WarningLevel,
onFinish = Logging.InfoLevel,
onFailure = Logging.DebugLevel
)
)
val sink = Sink.ignore
val result: Future[Done] = logger.runWith(sink)
result.onComplete{
case Success(_) =>
println("end of stream")
case Failure(_) =>
println("stream ended with failure")
}
source code is here.
Extending Stefano's answer a little I created the following flows:
def flowRate[T](metric: T => Int = (_: T) => 1, outputDelay: FiniteDuration = 1 second): Flow[T, Double, NotUsed] =
Flow[T]
.conflateWithSeed(metric(_)){ case (acc, x) ⇒ acc + metric(x) }
.zip(Source.tick(outputDelay, outputDelay, NotUsed))
.map(_._1.toDouble / outputDelay.toUnit(SECONDS))
def printFlowRate[T](name: String, metric: T => Int = (_: T) => 1,
outputDelay: FiniteDuration = 1 second): Flow[T, T, NotUsed] =
Flow[T]
.alsoTo(flowRate[T](metric, outputDelay)
.to(Sink.foreach(r => log.info(s"Rate($name): $r"))))
The first converts the flow into a rate per second. You can supply a metric which gives a value to each object passing through. Say you want to measure the rate of characters in a flow of strings then you could pass _.length. The second parameter is the delay between flow rate reports (defaults to one second).
The second flow can be used inline to print the flow rate for debugging purposes without modifying the value passing through the stream. eg
stringFlow
.via(printFlowRate[String]("Char rate", _.length, 10 seconds))
.map(_.toLowercase) // still a string
...
which will show every 10 seconds the average the rate (per second) of characters.
N.B. The above flowRate would however be lagging one outputDelay period behind, because the zip will consume from the conflate and then wait for a tick (which can be easily verified by putting a log after the conflateWithSeed). To obtain a non lagging flow rate (metric), one could duplicate the tick, in order to force the zip to consume a second fresh element from the conflate, and then aggregate both ticks, i.e.:
Flow[T]
.conflateWithSeed(metric(_)){case (acc, x) => acc + metric(x) }
.zip(Source.tick(outputDelay, outputDelay, NotUsed)
.mapConcat(_ => Seq(NotUsed, NotUsed))
)
.grouped(2).map {
case Seq((a, _), (b, _)) => a + b
}
.map(_.toDouble / outputDelay.toUnit(SECONDS))

Scala - Wait for all futures completed within time period

I have List[Future[String]] and would like to wait constant period of time in order to collect successfull computation as well as rerun operations for futures that do not complete in specified period of time.
In pseudo code it will look like:
val inputData: List[String] = getInputData()
val futures : List[Future[String]] = inputData.map(toLongRunningIOOperation)
val (completedFutures, unfinishedFutures) = Await.ready(futures, 2 seconds)
val rerunedOperations : List[Future[String]] = unfinisedFutures.map(rerun)
Such solution could be useful if you need to execute several calls to external services whose usual latency is (low p99 < 60ms) but sometimes requests are processed more than 5 seconds (because of current state/load). In such situation it is better to rerun those requests (i.e to another instance of the service).
As an example, use Future.firstCompletedOf function to get timed future
def futureToFutureOption[T](f: Future[T]): Future[Option[T]] = f.map(Some(_)).recover[Option[T]]{case _ => None}
val inputData: List[String] = List("a", "b", "c", "d")
val completedFuture = inputData.map { a =>
a match {
case "a" | "c" => Future.firstCompletedOf(Seq(Future{Thread.sleep(3000); a},
Future.failed{Thread.sleep(2000); new RuntimeException()}))
case _ => Future(a)
}
}
val unfinished = Future.sequence(completedFuture.map(futureToFutureOption)).map(list => inputData.toSet -- list.flatten.toSet)
val rerunedOperations: Future[Set[Future[String]]] = unfinished.map { _.map(foo) }
def foo(s: String): Future[String] = ???
Here in example rerunedOperations have different type as in yours example, but I think this will be fine for you.
Also keep in mind, if you're performing some call to external system inside the future and that future didn't finish in appropriate time, such approach will not prevent unfinished future from execution, I mean that the actual call to external system will be processing while you will try to make another call

scalaz-stream queue without hanging

I have a two-part question, so let me give some background first. I know that is possible to do something similar to what I want like this:
import scalaz.concurrent._
import scalaz.stream._
val q = async.unboundedQueue[Int]
val p: Process[Task, Int] = q.dequeue
q.enqueueAll(1 to 2).run
val p1: Process1[Int, Int] = process1.take(1)
p.pipe(p1).map(x => println(s"Answer: $x")).run.run
// Answer: 1
p.pipe(p1).map(x => println(s"Answer: $x")).run.run
// Answer: 2
p.pipe(p1).map(x => println(s"Answer: $x")).run.run
// hangs awaiting next input
Is there some other p1 that I could use that would give me the output below without hanging (it would be like process1.awaitOption)?
Answer: Some(1)
Answer: Some(2)
Answer: None
If yes, I think it would be easy to answer the next question. Is there some other p1 that I could use that would give me the output below without hanging (it would be like process1.chunkAll)?
Answer: Seq(1, 2)
Answer: Seq()
Answer: Seq()
Edit:
To complement the question to make it more understandable. If I have a loop like this:
for (i <- 1 to 4) {
p.pipe(p1).map(x => println(s"Answer: $x")).run.run
}
The result could be:
Answer: Seq()
// if someone pushes some values into the queue, like: q.enqueueAll(1 to 2).run
Answer: Seq(1, 2)
Answer: Seq()
Answer: Seq()
I hope it's clear now what I am trying to do. The problem is that I don't have control of the loop and I must not block it if there's no values in the queue.
I am not sure if i understand the semantics you trying to have, but generally the process may be interrupted (that means cancelled to wait for some value) by either closing queue externally, or by using wye. interrupt.
When you would like to have process terminate instead of awaiting next enqueued value? If let say you would like to have this on empty queue, there is "size" process and you may use that to interrupt awaiting queue if the size is empty, something like:
val empty : Process[Task,Boolean] = q.size.continuous.map(_ <= 0)
val deq : Process[Task,Int] = empty.wye(q.enqueue)(wye.interrupt)
Although I couldn't make Pavel's answer work the way I wanted, it was the turning point and I could use his advice to use the size signal.
I'm posting my answer here in case anyone need it:
import scalaz.concurrent._
import scalaz.stream._
val q = async.unboundedQueue[Int]
val p: Process[Task, Int] = q.size.continuous.take(1).flatMap { n => q.dequeue |> process1.take(n) }
q.enqueueAll(1 to 2).run
p.map(x => println(s"Answer: $x")).run.run
// Answer: 1
// Answer: 2
p.map(x => println(s"Answer: $x")).run.run
// not hanging awaiting next input
p.map(x => println(s"Answer: $x")).run.run
// not hanging awaiting next input
q.enqueueAll(1 to 2).run
p.map(x => println(s"Answer: $x")).run.run
// Answer: 1
// Answer: 2
I realize that it's not exactly answering the question since I don't have an explicit p1, but it's fine for my purposes.

Scala, finding max value in arrays

First time I've had to ask a question here, there is not enough info on Scala out there for a newbie like me.
Basically what I have is a file filled with hundreds of thousands of lists formatted like this:
(type, date, count, object)
Rows look something like this:
(food, 30052014, 400, banana)
(food, 30052014, 2, pizza)
All I need to is find the one row with the highest count.
I know I did this a couple of months ago but can't seem to wrap my head around it now. I'm sure I can do this without a function too. All I want to do is set a value and put that row in it but I can't figure it out.
I think basically what I want to do is a Math.max on the 3rd element in the lists, but I just can't get it.
Any help will be kindly appreciated. Sorry if my wording or formatting of this question isn't the best.
EDIT: There's some extra info I've left out that I should probably add:
All the records are stored in a tsv file. I've done this to split them:
val split_food = food.map(_.split("/t"))
so basically I think I need to use split_food... somehow
Modified version of #Szymon answer with your edit addressed:
val split_food = food.map(_.split("/t"))
val max_food = split_food.maxBy(tokens => tokens(2).toInt)
or, analogously:
val max_food = split_food.maxBy { case Array(_, _, count, _) => count.toInt }
In case you're using apache spark's RDD, which has limited number of usual scala collections methods, you have to go with reduce
val max_food = split_food.reduce { (max: Array[String], current: Array[String]) =>
val curCount = current(2).toInt
val maxCount = max(2).toInt // you probably would want to preprocess all items,
// so .toInt will not be called again and again
if (curCount > maxCount) current else max
}
You should use maxBy function:
case class Purchase(category: String, date: Long, count: Int, name: String)
object Purchase {
def apply(s: String) = s.split("\t") match {
case Seq(cat, date, count, name) => Purchase(cat, date.toLong, count.toInt, name)
}
}
foodRows.map(row => Purchase(row)).maxBy(_.count)
Simply:
case class Record(food:String, date:String, count:Int)
val l = List(Record("ciccio", "x", 1), Record("buffo", "y", 4), Record("banana", "z", 3))
l.maxBy(_.count)
>>> res8: Record = Record(buffo,y,4)
Not sure if you got the answer yet but I had the same issues with maxBy. I found once I ran the package... import scala.io.Source I was able to use maxBy and it worked.