I want to read multiple big files using Akka Streams to process each line. Imagine that each key consists of an (identifier -> value). If a new identifier is found, I want to save it and its value in the database; otherwise, if the identifier has already been found while processing the stream of lines, I want to save only the value. For that, I think that I need some kind of recursive stateful flow in order to keep the identifiers that have already been found in a Map. I think I'd receive in this flow a pair of (newLine, contextWithIdentifiers).
I've just started to look into Akka Streams. I guess I can manage myself to do the stateless processing stuff but I have no clue about how to keep the contextWithIdentifiers. I'd appreciate any pointers to the right direction.
Maybe something like statefulMapConcat can help you:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{Sink, Source}
import scala.util.Random._
import scala.math.abs
import scala.concurrent.ExecutionContext.Implicits.global
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
//encapsulating your input
case class IdentValue(id: Int, value: String)
//some random generated input
val identValues = List.fill(20)(IdentValue(abs(nextInt()) % 5, "valueHere"))
val stateFlow = Flow[IdentValue].statefulMapConcat{ () =>
//state with already processed ids
var ids = Set.empty[Int]
identValue => if (ids.contains(identValue.id)) {
//save value to DB
println(identValue.value)
List(identValue)
} else {
//save both to database
println(identValue)
ids = ids + identValue.id
List(identValue)
}
}
Source(identValues)
.via(stateFlow)
.runWith(Sink.seq)
.onSuccess { case identValue => println(identValue) }
A few years later, here is an implementation I wrote if you only need a 1-to-1 mapping (not 1-to-N):
import akka.stream.stage.{GraphStage, GraphStageLogic}
import akka.stream.{Attributes, FlowShape, Inlet, Outlet}
object StatefulMap {
def apply[T, O](converter: => T => O) = new StatefulMap[T, O](converter)
}
class StatefulMap[T, O](converter: => T => O) extends GraphStage[FlowShape[T, O]] {
val in = Inlet[T]("StatefulMap.in")
val out = Outlet[O]("StatefulMap.out")
val shape = FlowShape.of(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) {
val f = converter
setHandler(in, () => push(out, f(grab(in))))
setHandler(out, () => pull(in))
}
}
Test (and demo):
behavior of "StatefulMap"
class Counter extends (Any => Int) {
var count = 0
override def apply(x: Any): Int = {
count += 1
count
}
}
it should "not share state among substreams" in {
val result = await {
Source(0 until 10)
.groupBy(2, _ % 2)
.via(StatefulMap(new Counter()))
.fold(Seq.empty[Int])(_ :+ _)
.mergeSubstreams
.runWith(Sink.seq)
}
result.foreach(_ should be(1 to 5))
}
Related
I am quite new to Akka Streams, whereas I have some experience with Kafka Streams.
One thing it seems lacking in Akka Streams is the possibility to join together two different streams.
Kafka Streams allows joining information coming from two different streams (or tables) using the messages' keys.
Is there something similar in Akka Streams?
The short answer is unfortunately no. I would argue that Akka-streams is more low level than Kafka-Stream, Spark Streaming, or Flink. However, you have more control over what you are doing. Basically, it means that you can build your join operator. Check this discussion at lightbend.
Basically, you have to get data from 2 Sources, Merge them and send to a window based on time or number of tuples, compute the join, and emit the data to the Sink. I have done this PoC (which is still unfinished) but I follow the operators that I said to you here, and it is compiling and working. Basically, I still have to join the data inside the window. Currently, I am just emitting them in a mini-batch.
import akka.NotUsed
import akka.actor.ActorSystem
import akka.stream.{Attributes, ClosedShape, FlowShape, Inlet, Outlet}
import akka.stream.scaladsl.{Flow, GraphDSL, Merge, RunnableGraph, Sink, Source}
import akka.stream.stage.{GraphStage, GraphStageLogic, InHandler, OutHandler, TimerGraphStageLogic}
import scala.collection.mutable
import scala.concurrent.duration._
object StreamOpenGraphJoin {
def main(args: Array[String]): Unit = {
implicit val system = ActorSystem("StreamOpenGraphJoin")
val incrementSource: Source[Int, NotUsed] = Source(1 to 10).throttle(1, 1 second)
val decrementSource: Source[Int, NotUsed] = Source(10 to 20).throttle(1, 1 second)
def tokenizerSource(key: Int) = {
Flow[Int].map { value =>
(key, value)
}
}
// Step 1 - setting up the fundamental for a stream graph
val switchJoinStrategies = RunnableGraph.fromGraph(
GraphDSL.create() { implicit builder =>
import GraphDSL.Implicits._
// Step 2 - add partition and merge strategy
val tokenizerShape00 = builder.add(tokenizerSource(0))
val tokenizerShape01 = builder.add(tokenizerSource(1))
val mergeTupleShape = builder.add(Merge[(Int, Int)](2))
val batchFlow = Flow.fromGraph(new BatchTimerFlow[(Int, Int)](5 seconds))
val sinkShape = builder.add(Sink.foreach[(Int, Int)](x => println(s" > sink: $x")))
// Step 3 - tying up the components
incrementSource ~> tokenizerShape00 ~> mergeTupleShape.in(0)
decrementSource ~> tokenizerShape01 ~> mergeTupleShape.in(1)
mergeTupleShape.out ~> batchFlow ~> sinkShape
// Step 4 - return the shape
ClosedShape
}
)
// run the graph and materialize it
val graph = switchJoinStrategies.run()
}
// step 0: define the shape
class BatchTimerFlow[T](silencePeriod: FiniteDuration) extends GraphStage[FlowShape[T, T]] {
// step 1: define the ports and the component-specific members
val in = Inlet[T]("BatchTimerFlow.in")
val out = Outlet[T]("BatchTimerFlow.out")
// step 3: create the logic
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new TimerGraphStageLogic(shape) {
// mutable state
val batch = new mutable.Queue[T]
var open = false
// step 4: define mutable state implement my logic here
setHandler(in, new InHandler {
override def onPush(): Unit = {
try {
val nextElement = grab(in)
batch.enqueue(nextElement)
Thread.sleep(50) // simulate an expensive computation
if (open) pull(in) // send demand upstream signal, asking for another element
else {
// forward the element to the downstream operator
emitMultiple(out, batch.dequeueAll(_ => true).to[collection.immutable.Iterable])
open = true
scheduleOnce(None, silencePeriod)
}
} catch {
case e: Throwable => failStage(e)
}
}
})
setHandler(out, new OutHandler {
override def onPull(): Unit = {
pull(in)
}
})
override protected def onTimer(timerKey: Any): Unit = {
open = false
}
}
// step 2: construct a new shape
override def shape: FlowShape[T, T] = FlowShape[T, T](in, out)
}
}
I am working on a requirement to get stats about files stored in Linux using Scala.
We will pass the root directory as input and our code will get the complete list of sub directories for the root directory passed.
Then for each directory in the list i will get the files list and for each files I will get the owners, groups, permission, lastmodifiedtime, createdtime, lastaccesstime.
The problem is how to can I process the directories list in parallel to get the stats of the files stored in that directory.
In production environment we have 100000+ of folders inside root folders.
So my list is having 100000+ folders list.
How can I parallize my operation(file stats) on my available list.
Since I am new to Scala please help me in this requirement.
Sorry for posting without code snippet.
Thanks.
I ended up using Akka actors.
I made assumptions about your desired output so that the program would be simple and fast. The assumptions I made are that the output is JSON, the hierarchy is not preserved, and that multiple files are acceptable. If you don't like JSON, you can replace it with something else, but the other two assumptions are important for keeping the current speed and simplicity of the program.
There are some command line parameters you can set. If you don't set them, then defaults will be used. The defaults are contained in Main.scala.
The command line parameters are as follows:
(0) the root directory you are starting from; (no default)
(1) the timeout interval (in seconds) for all the timeouts in this program; (default is 60)
(2) the number of printer actors to use; this will be the number of log files created; (default is 50)
(3) the tick interval to use for the monitor actor; (default is 500)
For the timeout, keep in mind this is the value of the time interval to wait at the completion of the program. So if you run a small job and wonder why it is taking a minute to complete, it is because it is waiting for the timeout interval to elapse before closing the program.
Because you are running such a large job, it is possible that the default timeout of 60 is too small. If you are getting exceptions complaining about timeout, increase the timeout value.
Please note that if your tick interval is set too high, there is a chance your program will close prematurely.
To run, just start sbt in project folder, and type
runMain Main <canonical path of root directory>
I couldn't figure how to get the group of a File in Java. You'll need to research that and add the relevant code to Entity.scala and TraverseActor.scala.
Also f.list() in TraverseActor.scala was sometimes coming back as null, which was why I wrapped it in an Option. You'll have to debug that issue to make sure you aren't failing silently on certain files.
Now, here are the contents of all the files.
build.sbt
name := "stackoverflow20191110"
version := "0.1"
scalaVersion := "2.12.1"
libraryDependencies ++= Seq(
"io.circe" %% "circe-core",
"io.circe" %% "circe-generic",
"io.circe" %% "circe-parser"
).map(_ % "0.12.2")
libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.4.16"
Entity.scala
import io.circe.Encoder
import io.circe.generic.semiauto._
sealed trait Entity {
def path: String
def owner: String
def permissions: String
def lastModifiedTime: String
def creationTime: String
def lastAccessTime: String
def hashCode: Int
}
object Entity {
implicit val entityEncoder: Encoder[Entity] = deriveEncoder
}
case class FileEntity(path: String, owner: String, permissions: String, lastModifiedTime: String, creationTime: String, lastAccessTime: String) extends Entity
object fileentityEncoder {
implicit val fileentityEncoder: Encoder[FileEntity] = deriveEncoder
}
case class DirectoryEntity(path: String, owner: String, permissions: String, lastModifiedTime: String, creationTime: String, lastAccessTime: String) extends Entity
object DirectoryEntity {
implicit val directoryentityEncoder: Encoder[DirectoryEntity] = deriveEncoder
}
case class Contents(path: String, files: IndexedSeq[Entity])
object Contents {
implicit val contentsEncoder: Encoder[Contents] = deriveEncoder
}
Main.scala
import akka.actor.ActorSystem
import akka.pattern.ask
import akka.util.Timeout
import java.io.{BufferedWriter, File, FileWriter}
import ShutDownActor.ShutDownYet
import scala.concurrent.Await
import scala.concurrent.duration._
import scala.util.Try
object Main {
val defaultNumPrinters = 50
val defaultMonitorTickInterval = 500
val defaultTimeoutInS = 60
def main(args: Array[String]): Unit = {
val timeoutInS = Try(args(1).toInt).toOption.getOrElse(defaultTimeoutInS)
val system = ActorSystem("SearchHierarchy")
val shutdown = system.actorOf(ShutDownActor.props)
val monitor = system.actorOf(MonitorActor.props(shutdown, timeoutInS))
val refs = (0 until Try(args(2).toInt).toOption.getOrElse(defaultNumPrinters)).map{x =>
val name = "logfile" + x
(name, system.actorOf(PrintActor.props(name, Try(args(3).toInt).toOption.getOrElse(defaultMonitorTickInterval), monitor)))
}
val root = system.actorOf(TraverseActor.props(new File(args(0)), refs))
implicit val askTimeout = Timeout(timeoutInS seconds)
var isTimedOut = false
while(!isTimedOut){
Thread.sleep(30000)
val fut = (shutdown ? ShutDownYet).mapTo[Boolean]
isTimedOut = Await.result(fut, timeoutInS seconds)
}
refs.foreach{ x =>
val fw = new BufferedWriter(new FileWriter(new File(x._1), true))
fw.write("{}\n]")
fw.close()
}
system.terminate
}
}
MonitorActor.scala
import MonitorActor.ShutDown
import akka.actor.{Actor, ActorRef, Props, ReceiveTimeout, Stash}
import io.circe.syntax._
import scala.concurrent.duration._
class MonitorActor(shutdownActor: ActorRef, timeoutInS: Int) extends Actor with Stash {
context.setReceiveTimeout(timeoutInS seconds)
override def receive: Receive = {
case ReceiveTimeout =>
shutdownActor ! ShutDown
}
}
object MonitorActor {
def props(shutdownActor: ActorRef, timeoutInS: Int) = Props(new MonitorActor(shutdownActor, timeoutInS))
case object ShutDown
}
PrintActor.scala
import java.io.{BufferedWriter, File, FileWriter, PrintWriter}
import akka.actor.{Actor, ActorRef, Props, Stash}
import PrintActor.{Count, HeartBeat}
class PrintActor(name: String, interval: Int, monitorActor: ActorRef) extends Actor with Stash {
val file = new File(name)
override def preStart = {
val fw = new BufferedWriter(new FileWriter(file, true))
fw.write("[\n")
fw.close()
self ! Count(0)
}
override def receive: Receive = {
case Count(c) =>
context.become(withCount(c))
unstashAll()
case _ =>
stash()
}
def withCount(c: Int): Receive = {
case s: String =>
val fw = new BufferedWriter(new FileWriter(file, true))
fw.write(s)
fw.write(",\n")
fw.close()
if (c == interval) {
monitorActor ! HeartBeat
context.become(withCount(0))
} else {
context.become(withCount(c+1))
}
}
}
object PrintActor {
def props(name: String, interval: Int, monitorActor: ActorRef) = Props(new PrintActor(name, interval, monitorActor))
case class Count(count: Int)
case object HeartBeat
}
ShutDownActor.scala
import MonitorActor.ShutDown
import ShutDownActor.ShutDownYet
import akka.actor.{Actor, Props, Stash}
class ShutDownActor() extends Actor with Stash {
override def receive: Receive = {
case ShutDownYet => sender ! false
case ShutDown => context.become(canShutDown())
}
def canShutDown(): Receive = {
case ShutDownYet => sender ! true
}
}
object ShutDownActor {
def props = Props(new ShutDownActor())
case object ShutDownYet
}
TraverseActor.scala
import java.io.File
import akka.actor.{Actor, ActorRef, PoisonPill, Props, ReceiveTimeout}
import io.circe.syntax._
import scala.collection.JavaConversions
import scala.concurrent.duration._
import scala.util.Try
class TraverseActor(start: File, printers: IndexedSeq[(String, ActorRef)]) extends Actor{
val hash = start.hashCode()
val mod = hash % printers.size
val idx = if (mod < 0) -mod else mod
val myPrinter = printers(idx)._2
override def preStart = {
self ! start
}
override def receive: Receive = {
case f: File =>
val path = f.getCanonicalPath
val files = Option(f.list()).map(_.toIndexedSeq.map(x =>new File(path + "/" + x)))
val directories = files.map(_.filter(_.isDirectory))
directories.foreach(ds => processDirectories(ds))
val entities = files.map{fs =>
fs.map{ f =>
val path = f.getCanonicalPath
val owner = Try(java.nio.file.Files.getOwner(f.toPath).toString).toOption.getOrElse("")
val permissions = Try(java.nio.file.Files.getPosixFilePermissions(f.toPath).toString).toOption.getOrElse("")
val attributes = Try(java.nio.file.Files.readAttributes(f.toPath, "lastModifiedTime,creationTime,lastAccessTime"))
val lastModifiedTime = attributes.flatMap(a => Try(a.get("lastModifiedTime").toString)).toOption.getOrElse("")
val creationTime = attributes.flatMap(a => Try(a.get("creationTime").toString)).toOption.getOrElse("")
val lastAccessTime = attributes.flatMap(a => Try(a.get("lastAccessTime").toString)).toOption.getOrElse("")
if (f.isDirectory) FileEntity(path, owner, permissions, lastModifiedTime, creationTime, lastAccessTime)
else DirectoryEntity(path, owner, permissions, lastModifiedTime, creationTime, lastAccessTime)
}
}
directories match {
case Some(seq) =>
seq match {
case x+:xs =>
case IndexedSeq() => self ! PoisonPill
}
case None => self ! PoisonPill
}
entities.foreach(e => myPrinter ! Contents(f.getCanonicalPath, e).asJson.toString)
}
def processDirectories(directories: IndexedSeq[File]): Unit = {
def inner(fs: IndexedSeq[File]): Unit = {
fs match {
case x +: xs =>
context.actorOf(TraverseActor.props(x, printers))
processDirectories(xs)
case IndexedSeq() =>
}
}
directories match {
case x +: xs =>
self ! x
inner(xs)
case IndexedSeq() =>
}
}
}
object TraverseActor {
def props(start: File, printers: IndexedSeq[(String, ActorRef)]) = Props(new TraverseActor(start, printers))
}
I only tested on a small example, so it is possible this program will run into problems when running your job. If that happens, feel free to ask questions.
The goal is to implement a function with this signature
def bindedSinkAndSource[A]:(Sink[A, Any], Source[A, Any]) = ???
where the returned source emits whatever the sink receives.
My primary goal is to implement a websocket forwarder by means of the handleWebSocketMessages directive.
The forwarder graph is:
leftReceiver ~> rightEmitter
leftEmitter <~ rightReceiver
where the leftReceiver and leftEmiter are the in and out of the left endpoint handler flow; and rightReceiver and rightEmitter are the in and out of the right endpoint handler flow.
For example:
import akka.NotUsed
import akka.http.scaladsl.model.ws.Message
import akka.http.scaladsl.server.Directive.addByNameNullaryApply
import akka.http.scaladsl.server.Directives._
import akka.http.scaladsl.server.Route
import akka.stream.scaladsl.Flow
import akka.stream.scaladsl.Sink
import akka.stream.scaladsl.Source
def buildHandlers(): Route = {
val (leftReceiver, rightEmitter) = bindedSinkAndSource[Message];
val (rightReceiver, leftEmitter) = bindedSinkAndSource[Message];
val leftHandlerFlow = Flow.fromSinkAndSource(leftReceiver, leftEmitter)
val rightHandlerFlow = Flow.fromSinkAndSource(rightReceiver, rightEmitter)
pathPrefix("leftEndpointChannel") {
handleWebSocketMessages(leftHandlerFlow)
} ~
pathPrefix("rightEndpointChannel") {
handleWebSocketMessages(rightHandlerFlow)
}
}
All the ideas that came to me were frustrated by the fact that thehandleWebSocketMessages(..) directive don't give access to the materialized value of the received flow.
I found a way to achieve the goal, but there could be shorter and easier ways. If you know one, please don't hesitate to add your knowledge.
import org.reactivestreams.Publisher
import org.reactivestreams.Subscriber
import org.reactivestreams.Subscription
import akka.NotUsed
import akka.stream.scaladsl.Sink
import akka.stream.scaladsl.Source
def bindedSinkAndSource[A]: (Sink[A, NotUsed], Source[A, NotUsed]) = {
class Binder extends Subscriber[A] with Publisher[A] { binder =>
var oUpStreamSubscription: Option[Subscription] = None;
var oDownStreamSubscriber: Option[Subscriber[_ >: A]] = None;
var pendingRequestFromDownStream: Option[Long] = None;
var pendingCancelFromDownStream: Boolean = false;
def onSubscribe(upStreamSubscription: Subscription): Unit = {
this.oUpStreamSubscription match {
case Some(_) => upStreamSubscription.cancel // rule 2-5
case None =>
this.oUpStreamSubscription = Some(upStreamSubscription);
if (pendingRequestFromDownStream.isDefined) {
upStreamSubscription.request(pendingRequestFromDownStream.get)
pendingRequestFromDownStream = None
}
if (pendingCancelFromDownStream) {
upStreamSubscription.cancel()
pendingCancelFromDownStream = false
}
}
}
def onNext(a: A): Unit = {
oDownStreamSubscriber.get.onNext(a)
}
def onComplete(): Unit = {
oDownStreamSubscriber.foreach { _.onComplete() };
this.oUpStreamSubscription = None
}
def onError(error: Throwable): Unit = {
oDownStreamSubscriber.foreach { _.onError(error) };
this.oUpStreamSubscription = None
}
def subscribe(downStreamSubscriber: Subscriber[_ >: A]): Unit = {
assert(this.oDownStreamSubscriber.isEmpty);
this.oDownStreamSubscriber = Some(downStreamSubscriber);
downStreamSubscriber.onSubscribe(new Subscription() {
def request(n: Long): Unit = {
binder.oUpStreamSubscription match {
case Some(usSub) => usSub.request(n);
case None =>
assert(binder.pendingRequestFromDownStream.isEmpty);
binder.pendingRequestFromDownStream = Some(n);
}
};
def cancel(): Unit = {
binder.oUpStreamSubscription match {
case Some(usSub) => usSub.cancel();
case None =>
assert(binder.pendingCancelFromDownStream == false);
binder.pendingCancelFromDownStream = true;
}
binder.oDownStreamSubscriber = None
}
})
}
}
val binder = new Binder;
val receiver = Sink.fromSubscriber(binder);
val emitter = Source.fromPublisher(binder);
(receiver, emitter);
}
Note that the instance vars of the Binder class may suffer concurrency problems if the sink and source this method creates are not fused later by the user. If that is not the case, all the accesses to these variables should be enclosed inside synchronized zones. Another solution would be to ensure that the sink and the source are materialized in an execution context with a single thread.
Two days later I discovered MergeHub and BroadcastHub. Using them the answer is much shorter:
import akka.stream.Materializer
def bindedSinkAndSource[T](implicit sm: Materializer): (Sink[T, NotUsed], Source[T, NotUsed]) = {
import akka.stream.scaladsl.BroadcastHub;
import akka.stream.scaladsl.MergeHub;
import akka.stream.scaladsl.Keep;
MergeHub.source[T](perProducerBufferSize = 8).toMat(BroadcastHub.sink[T](bufferSize = 256))(Keep.both) run
}
with the advantage that the returned sink and source can be materialized multiple times.
I try write some simple akka-http and akka-streams based application, that handle http requests, always with one precompiled stream, because I plan to use long time processing with back-pressure in my requestProcessor stream
My application code:
import akka.actor.{ActorSystem, Props}
import akka.http.scaladsl._
import akka.http.scaladsl.server.Directives._
import akka.http.scaladsl.server._
import akka.stream.ActorFlowMaterializer
import akka.stream.actor.ActorPublisher
import akka.stream.scaladsl.{Sink, Source}
import scala.annotation.tailrec
import scala.concurrent.Future
object UserRegisterSource {
def props: Props = Props[UserRegisterSource]
final case class RegisterUser(username: String)
}
class UserRegisterSource extends ActorPublisher[UserRegisterSource.RegisterUser] {
import UserRegisterSource._
import akka.stream.actor.ActorPublisherMessage._
val MaxBufferSize = 100
var buf = Vector.empty[RegisterUser]
override def receive: Receive = {
case request: RegisterUser =>
if (buf.isEmpty && totalDemand > 0)
onNext(request)
else {
buf :+= request
deliverBuf()
}
case Request(_) =>
deliverBuf()
case Cancel =>
context.stop(self)
}
#tailrec final def deliverBuf(): Unit =
if (totalDemand > 0) {
if (totalDemand <= Int.MaxValue) {
val (use, keep) = buf.splitAt(totalDemand.toInt)
buf = keep
use foreach onNext
} else {
val (use, keep) = buf.splitAt(Int.MaxValue)
buf = keep
use foreach onNext
deliverBuf()
}
}
}
object Main extends App {
val host = "127.0.0.1"
val port = 8094
implicit val system = ActorSystem("my-testing-system")
implicit val fm = ActorFlowMaterializer()
implicit val executionContext = system.dispatcher
val serverSource: Source[Http.IncomingConnection, Future[Http.ServerBinding]] = Http(system).bind(interface = host, port = port)
val mySource = Source.actorPublisher[UserRegisterSource.RegisterUser](UserRegisterSource.props)
val requestProcessor = mySource
.mapAsync(1)(fakeSaveUserAndReturnCreatedUserId)
.to(Sink.head[Int])
.run()
val route: Route =
get {
path("test") {
parameter('test) { case t: String =>
requestProcessor ! UserRegisterSource.RegisterUser(t)
???
}
}
}
def fakeSaveUserAndReturnCreatedUserId(param: UserRegisterSource.RegisterUser): Future[Int] =
Future.successful {
1
}
serverSource.to(Sink.foreach {
connection =>
connection handleWith Route.handlerFlow(route)
}).run()
}
I found solution about how create Source that can dynamically accept new items to process, but I can found any solution about how than obtain result of stream execution in my route
The direct answer to your question is to materialize a new Stream for each HttpRequest and use Sink.head to get the value you're looking for. Modifying your code:
val requestStream =
mySource.map(fakeSaveUserAndReturnCreatedUserId)
.to(Sink.head[Int])
//.run() - don't materialize here
val route: Route =
get {
path("test") {
parameter('test) { case t: String =>
//materialize a new Stream here
val userIdFut : Future[Int] = requestStream.run()
requestProcessor ! UserRegisterSource.RegisterUser(t)
//get the result of the Stream
userIdFut onSuccess { case userId : Int => ...}
}
}
}
However, I think your question is ill posed. In your code example the only thing you're using an akka Stream for is to create a new UserId. Futures readily solve this problem without the need for a materialized Stream (and all the accompanying overhead):
val route: Route =
get {
path("test") {
parameter('test) { case t: String =>
val user = RegisterUser(t)
fakeSaveUserAndReturnCreatedUserId(user) onSuccess { case userId : Int =>
...
}
}
}
}
If you want to limit the number of concurrent calls to fakeSaveUserAndReturnCreateUserId then you can create an ExecutionContext with a defined ThreadPool size, as explained in the answer to this question, and use that ExecutionContext to create the Futures:
val ThreadCount = 10 //concurrent queries
val limitedExecutionContext =
ExecutionContext.fromExecutor(Executors.newFixedThreadPool(ThreadCount))
def fakeSaveUserAndReturnCreatedUserId(param: UserRegisterSource.RegisterUser): Future[Int] =
Future { 1 }(limitedExecutionContext)
I want to create a Play 2 Enumeratee that takes in values and outputs them, chunked together, every x seconds/milliseconds. That way, in a multi-user websocket environment with lots of user input, one could limit the number of received frames per second.
I know that it's possible to group a set number of items together like this:
val chunker = Enumeratee.grouped(
Traversable.take[Array[Double]](5000) &>> Iteratee.consume()
)
Is there a built-in way to do this based on time rather than based on the number of items?
I was thinking about doing this somehow with a scheduled Akka job, but on first sight this seems inefficient, and I'm not sure if concurency issues would arise.
How about like this? I hope this is helpful for you.
package controllers
import play.api._
import play.api.Play.current
import play.api.mvc._
import play.api.libs.iteratee._
import play.api.libs.concurrent.Akka
import play.api.libs.concurrent.Promise
object Application extends Controller {
def index = Action {
val queue = new scala.collection.mutable.Queue[String]
Akka.future {
while( true ){
Logger.info("hogehogehoge")
queue += System.currentTimeMillis.toString
Thread.sleep(100)
}
}
val timeStream = Enumerator.fromCallback { () =>
Promise.timeout(Some(queue), 200)
}
Ok.stream(timeStream.through(Enumeratee.map[scala.collection.mutable.Queue[String]]({ queue =>
var str = ""
while(queue.nonEmpty){
str += queue.dequeue + ", "
}
str
})))
}
}
And this document is also helpful for you.
http://www.playframework.com/documentation/2.0/Enumerators
UPDATE
This is for play2.1 version.
package controllers
import play.api._
import play.api.Play.current
import play.api.mvc._
import play.api.libs.iteratee._
import play.api.libs.concurrent.Akka
import play.api.libs.concurrent.Promise
import scala.concurrent._
import ExecutionContext.Implicits.global
object Application extends Controller {
def index = Action {
val queue = new scala.collection.mutable.Queue[String]
Akka.future {
while( true ){
Logger.info("hogehogehoge")
queue += System.currentTimeMillis.toString
Thread.sleep(100)
}
}
val timeStream = Enumerator.repeatM{
Promise.timeout(queue, 200)
}
Ok.stream(timeStream.through(Enumeratee.map[scala.collection.mutable.Queue[String]]({ queue =>
var str = ""
while(queue.nonEmpty){
str += queue.dequeue + ", "
}
str
})))
}
}
Here I've quickly defined an iteratee that will take values from an input for a fixed time length t measured in milliseconds and an enumeratee that will allow you to group and further process an input stream divided into segments constructed within such length t. It relies on JodaTime to keep track of how much time has passed since the iteratee began.
def throttledTakeIteratee[E](timeInMillis: Long): Iteratee[E, List[E]] = {
var startTime = new Instant()
def step(state: List[E])(input: Input[E]): Iteratee[E, List[E]] = {
val timePassed = new Interval(startTime, new Instant()).toDurationMillis
input match {
case Input.EOF => { startTime = new Instant; Done(state, Input.EOF) }
case Input.Empty => Cont[E, List[E]](i => step(state)(i))
case Input.El(e) =>
if (timePassed >= timeInMillis) { startTime = new Instant; Done(e::state, Input.Empty) }
else Cont[E, List[E]](i => step(e::state)(i))
}
}
Cont(step(List[E]()))
}
def throttledTake[T](timeInMillis: Long) = Enumeratee.grouped(throttledTakeIteratee[T](timeInMillis))