I'm trying to do implement paging using Akka Streams. Currently I have
case class SomeObject(id:Long, next_page:Option[Map[String,String]])
def chainRequests(uri: Uri): Future[Option[(Uri, T)]] = {
if (uri.isEmpty) return Future.successful(None)
val response: Future[Response[T]] = sendWithRetry(prepareRequest(HttpMethods.GET, uri)).flatMap(unmarshal)
response.map { resp =>
resp.next_page match {
case Some(next_page) => Some(next_page("uri"), resp.data)
case _ => Some(Uri.Empty, resp.data)
}
}
}
Source.single(SomeObject).map(Uri(s"object/${_.id}")).map(uri => Source.unfoldAsync(url)(chainRequest)).map(...some processing goes here)
The problem is that if I do source.take(1000) and paging has a lot of elements(pages) than downstream does not gets new elements until Source.unfoldAsync finishes.
I was trying to use cycles in Flows like
val in = builder.add(Flow[Uri])
val out = builder.add[Flow[T]]
val partition = b.add(Partition[Response[T]](2,r => r.next_page match {case Some(_)=>1; case None => 0}))
val merge = b.add(Merge[Response[T]],2)
in ~> mergeUri ~> sendRequest ~> partition
mergeUri.preferred <~ extractNextUri <~ partition.out(1)
partition.out(0) ~> Flow[Response[T]].map(_.data) ~> out
FlowShape(in.in, out.out)
But above code does not work.
I'm stuck with creating my own GraphStage. UnfoldAsync takes first element, but with Flow solution I don't have "first" element. Any suggestions?
Thanks
Found the solution with writing my own GraphStage
final class PaginationGraphStage[S <: Uri, E](f: S => Future[Option[(S, E)]])(implicit ec: ExecutionContextExecutor)
extends GraphStage[FlowShape[S, E]]{
val in: Inlet[S] = Inlet[S]("PaginationGraphStage.in")
val out: Outlet[E] = Outlet[E]("PaginationGraphStage.out")
override val shape: FlowShape[S, E] = FlowShape.of(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) with OutHandler with InHandler {
private[this] var state: S = _
private[this] var inFlight = 0
private[this] var asyncFinished = false
private[this] def todo: Int = inFlight
def futureCompleted(result: Try[Option[(Uri, E)]]): Unit = {
inFlight -= 1
result match {
case Failure(ex) => fail(out, ex)
case Success(None) =>
asyncFinished = true
complete(out)
case Success(Some((newS: S, elem: E))) if !newS.isEmpty =>
push(out, elem)
state = newS
case Success(Some((newS: Uri, elem: E))) =>
push(out, elem)
asyncFinished = true
if (isAvailable(in)) getHandler(in).onPush()
else completeStage()
}
}
private val futureCB = getAsyncCallback(futureCompleted)
private val invokeFutureCB: Try[Option[(S, E)]] => Unit = futureCB.invoke
private def pullIfNeeded(): Unit = {
if (!hasBeenPulled(in)) tryPull(in)
}
override def onUpstreamFinish(): Unit = {
if (todo == 0) completeStage()
}
def onPull(): Unit = {
if (state != null) {
asyncFinished = false
inFlight += 1
val future = f(state)
future.value match {
case None => future.onComplete(invokeFutureCB)
case Some(v) => futureCompleted(v)
}
} else {
pullIfNeeded()
}
}
override def onPush(): Unit = {
if (state == null) {
inFlight += 1
state = grab(in)
pullIfNeeded()
getHandler(out).onPull()
}
if (asyncFinished) {
inFlight += 1
state = grab(in)
pullIfNeeded()
}
}
setHandlers(in, out, this)
}
}
Related
I'm not sure whether I chose the right title for my question..
I'm interested as to why the collection in the companion object is defined. Am I mistaken that this collection will have only one f in it? What I am seeing is a collection with exactly one element.
Here's the Future I'm dealing with:
trait Future[+T] { self =>
def onComplete(callback: Try[T] => Unit): Unit
def map[U](f: T => U) = new Future[U] {
def onComplete(callback: Try[U] => Unit) =
self onComplete (t => callback(t.map(f)))
}
def flatMap[U](f: T => Future[U]) = new Future[U] {
def onComplete(callback: Try[U] => Unit) =
self onComplete { _.map(f) match {
case Success(fu) => fu.onComplete(callback)
case Failure(e) => callback(Failure(e))
} }
}
def filter(p: T => Boolean) =
map { t => if (!p(t)) throw new NoSuchElementException; t }
}
Its companion object:
object Future {
def apply[T](f: => T) = {
val handlers = collection.mutable.Buffer.empty[Try[T] => Unit]
var result: Option[Try[T]] = None
val runnable = new Runnable {
def run = {
val r = Try(f)
handlers.synchronized {
result = Some(r)
handlers.foreach(_(r))
}
}
}
(new Thread(runnable)).start()
new Future[T] {
def onComplete(f: Try[T] => Unit) = handlers.synchronized {
result match {
case None => handlers += f
case Some(r) => f(r)
}
}
}
}
}
In my head I was imagining something like the following instead of the above companion object (notice how I replaced the above val handlers .. with var handler ..):
object Future {
def apply[T](f: => T) = {
var handler: Option[Try[T] => Unit] = None
var result: Option[Try[T]] = None
val runnable = new Runnable {
val execute_when_ready: Try[T] => Unit = r => handler match {
case None => execute_when_ready(r)
case Some(f) => f(r)
}
def run = {
val r = Try(f)
handler.synchronized {
result = Some(r)
execute_when_ready(r)
}
}
}
(new Thread(runnable)).start()
new Future[T] {
def onComplete(f: Try[T] => Unit) = handler.synchronized {
result match {
case None => handler = Some(f)
case Some(r) => f(r)
}
}
}
}
}
So why does the function execute_when_ready leads to stackoverflow, but that's not the case with handlers.foreach? what is the collection is offering me which I can't do without it? And is it possible to replace the collection with something else in the companion object?
The collection is not in the companion object, it is in the apply method, so there is a new instance for each Future. It is there because there can be multiple pending onComplete handlers on the same Future.
Your implementation only allows a single handler and silently removes any existing handler in onComplete which is a bad idea because the caller has no idea if a previous function has added an onComplete handler or not.
As noted in the comments, the stack overflow is because execute_when_ready calls itself if handler is None with no mechanism to stop the recursion.
Suppose there is a stream of some files to be processed and only a specific file should be processed(consumed) when a condition is met.
i.e. Only if the stream contains a file named "aaa", process a file named "bbb"
SomeFile(name: String)
What would be the correct(recommended) way to do this?
Okay, here's an example. Be careful about building up too big a buffer here before the trigger hits
class FileFinder {
def matchFiles(triggerName: String,
matchName: String): Flow[SomeFile, SomeFile, NotUsed] =
Flow[SomeFile].statefulMapConcat(
statefulMatcher(matches(triggerName), matches(matchName)))
private def matches(matchName: String): SomeFile => Boolean = {
case SomeFile(name) if name == matchName => true
case _ => false
}
private def statefulMatcher(
triggerFilter: => SomeFile => Boolean,
sendFilter: SomeFile => Boolean): () => SomeFile => List[SomeFile] = {
var found = false
var sendFiles: List[SomeFile] = Nil
() => file: SomeFile =>
{
file match {
case f if triggerFilter(f) =>
found = true
val send = sendFiles
sendFiles = Nil
send
case f if sendFilter(f) =>
if (found)
List(f)
else {
sendFiles = f :: sendFiles
Nil
}
case _ => Nil
}
}
}
}
object FileFinder extends FileFinder {
def main(args: Array[String]): Unit = {
implicit val system: ActorSystem = ActorSystem("finder")
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val executor: ExecutionContextExecutor =
materializer.executionContext
implicit val loggingAdapter: LoggingAdapter = system.log
val files = List(SomeFile("aaa"), SomeFile("bbb"), SomeFile("aaa"))
Source(files)
.via(matchFiles("bbb", "aaa"))
.runForeach(println(_))
.andThen({
case Success(_) =>
println("Success")
system.terminate()
case Failure(ex) =>
loggingAdapter.error("Shouldn't happen...", ex)
system.terminate()
})
}
}
case class SomeFile(name: String)
I am trying to test my sliding window stage using the Akka Streams TestKit and I see this exception.
Exception in thread "main" java.lang.AssertionError: assertion failed: expected OnNext(Stream(2, ?)), found OnError(java.lang.IllegalArgumentException: Cannot push port (Sliding.out(2043106095)) twice, or before it being pulled
Akka, Akka Streams, Akka Streams TestKit version: 2.5.9
Scala version: 2.12.4
case class Sliding[T](duration: Duration, step: Duration, f: T => Long) extends GraphStage[FlowShape[T, immutable.Seq[T]]] {
val in = Inlet[T]("Sliding.in")
val out = Outlet[immutable.Seq[T]]("Sliding.out")
override val shape: FlowShape[T, immutable.Seq[T]] = FlowShape(in, out)
override protected val initialAttributes: Attributes = Attributes.name("sliding")
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new GraphStageLogic(shape) with InHandler with OutHandler {
private var buf = Vector.empty[T]
var watermark = 0L
var dropUntilDuration = step.toMillis
private def isWindowDone(current: T) = {
if (buf.nonEmpty) {
val hts = f(buf.head)
val cts = f(current)
cts >= hts + duration.toMillis
} else false
}
override def onPush(): Unit = {
val data = grab(in)
val timeStamp = f(data)
if (timeStamp > watermark) {
watermark = timeStamp
if (isWindowDone(data)) {
push(out, buf)
buf = buf.dropWhile { x =>
val ts = f(x)
ts < dropUntilDuration
}
dropUntilDuration = dropUntilDuration + step.toMillis
}
buf :+= data
pull(in)
} else {
pull(in)
}
}
override def onPull(): Unit = {
pull(in)
}
override def onUpstreamFinish(): Unit = {
if (buf.nonEmpty) {
push(out, buf)
}
completeStage()
}
this.setHandlers(in, out, this)
}
}
Test code:
object WindowTest extends App {
implicit val as = ActorSystem("WindowTest")
implicit val m = ActorMaterializer()
val expectedResultIterator = Stream.from(1).map(_.toLong)
val infinite = Iterator.from(1)
Source
.fromIterator(() => infinite)
.map(_.toLong)
.via(Sliding(10 millis, 2 millis, identity))
.runWith(TestSink.probe[Seq[Long]])
.request(1)
.expectNext(expectedResultIterator.take(10).toSeq)
.request(1)
.expectNext(expectedResultIterator.take(11).drop(1).toSeq)
.expectComplete()
}
I have GraphStage taken from https://stackoverflow.com/a/40962834/772249 that looks like this. It's working as WebSocket server:
class TerminateFlowStage[T](
predicate: T => Boolean,
forwardTerminatingMessage: Boolean = false,
terminate: Boolean = true)
extends GraphStage[FlowShape[T, T]]
{
val in = Inlet[T]("TerminateFlowStage.in")
val out = Outlet[T]("TerminateFlowStage.out")
override val shape = FlowShape.of(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic =
new GraphStageLogic(shape) {
setHandlers(in, out, new InHandler with OutHandler {
override def onPull(): Unit = { pull(in) }
override def onPush(): Unit = {
val chunk = grab(in)
if (predicate(chunk)) {
if (forwardTerminatingMessage) {
push(out, chunk)
}
if (terminate) {
failStage(new RuntimeException("Flow terminated by TerminateFlowStage"))
} else {
completeStage()
}
} else {
push(out, chunk)
}
}
})
}
}
val termOnKillMe = new TerminateFlowStage[Message]((chunk: Message) => chunk match {
case TextMessage.Strict(text) => text.toInt > 5
case _ => false
})
val route =
path("") {
get {
extractUpgradeToWebSocket {
upgrade =>
complete(upgrade.handleMessagesWithSinkSource(
Sink.ignore,
Source(1 to 10).
map(i => TextMessage(i.toString)).throttle(1, 1.second, 1, ThrottleMode.shaping).via(termOnKillMe)
))
}
}
}
So, after 5 messages, WebSocket server drops connection.
I have this flow for WebSocket client:
val flow: Flow[Message, Message, Future[Seq[Message]]] =
Flow.fromSinkAndSourceMat(
Sink.seq[Message],
Source.maybe[Message])(Keep.left)
val (upgradeResponse, promise) =
Http().singleWebSocketRequest(
WebSocketRequest("ws://localhost:8080/"),
flow.recoverWithRetries(1, {
case _ => Source.empty
})
)
The problem is, that WebSocket client flow cannot recover from an exception raised by TerminateFlowStage. Getting Future(Failure(akka.http.scaladsl.model.ws.PeerClosedConnectionException: Peer closed connection with code 1011 'internal error'))
Without exception raised, everything works great.
I have Kryo-serialized binary data stored on S3 (thousands of serialized objects).
Alpakka allows to read the content as data: Source[ByteString, NotUsed]. But Kryo format doesn't use delimiters so I can't split each serialized object into a separate ByteString using data.via(Framing.delimiter(...)).
So, Kryo actually needs to read the data to understand when an object ends, and it doesn't look streaming-friendly.
Is it possible to implement this case in streaming fashion so that I get Source[MyObject, NotUsed] in the end of the day?
Here is a graph stage that does that. It handles the case when a serialized object spans two byte strings. It needs to be improved when objects are large (not my use case) and can take more than two byte strings in Source[ByteString, NotUsed].
object KryoReadStage {
def flow[T](kryoSupport: KryoSupport,
`class`: Class[T],
serializer: Serializer[_]): Flow[ByteString, immutable.Seq[T], NotUsed] =
Flow.fromGraph(new KryoReadStage[T](kryoSupport, `class`, serializer))
}
final class KryoReadStage[T](kryoSupport: KryoSupport,
`class`: Class[T],
serializer: Serializer[_])
extends GraphStage[FlowShape[ByteString, immutable.Seq[T]]] {
override def shape: FlowShape[ByteString, immutable.Seq[T]] = FlowShape.of(in, out)
override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = {
new GraphStageLogic(shape) {
setHandler(in, new InHandler {
override def onPush(): Unit = {
val bytes =
if (previousBytes.length == 0) grab(in)
else ByteString.fromArrayUnsafe(previousBytes) ++ grab(in)
Managed(new Input(new ByteBufferBackedInputStream(bytes.asByteBuffer))) { input =>
var position = 0
val acc = ListBuffer[T]()
kryoSupport.withKryo { kryo =>
var last = false
while (!last && !input.eof()) {
tryRead(kryo, input) match {
case Some(t) =>
acc += t
position = input.total().toInt
previousBytes = EmptyArray
case None =>
val bytesLeft = new Array[Byte](bytes.length - position)
val bb = bytes.asByteBuffer
bb.position(position)
bb.get(bytesLeft)
last = true
previousBytes = bytesLeft
}
}
push(out, acc.toList)
}
}
}
private def tryRead(kryo: Kryo, input: Input): Option[T] =
try {
Some(kryo.readObject(input, `class`, serializer))
} catch {
case _: KryoException => None
}
})
setHandler(out, new OutHandler {
override def onPull(): Unit = {
pull(in)
}
})
private val EmptyArray: Array[Byte] = Array.empty
private var previousBytes: Array[Byte] = EmptyArray
}
}
override def toString: String = "KryoReadStage"
private lazy val in: Inlet[ByteString] = Inlet("KryoReadStage.in")
private lazy val out: Outlet[immutable.Seq[T]] = Outlet("KryoReadStage.out")
}
Example usage:
client.download(BucketName, key)
.via(KryoReadStage.flow(kryoSupport, `class`, serializer))
.flatMapConcat(Source(_))
It uses some additional helpers below.
ByteBufferBackedInputStream:
class ByteBufferBackedInputStream(buf: ByteBuffer) extends InputStream {
override def read: Int = {
if (!buf.hasRemaining) -1
else buf.get & 0xFF
}
override def read(bytes: Array[Byte], off: Int, len: Int): Int = {
if (!buf.hasRemaining) -1
else {
val read = Math.min(len, buf.remaining)
buf.get(bytes, off, read)
read
}
}
}
Managed:
object Managed {
type AutoCloseableView[T] = T => AutoCloseable
def apply[T: AutoCloseableView, V](resource: T)(op: T => V): V =
try {
op(resource)
} finally {
resource.close()
}
}
KryoSupport:
trait KryoSupport {
def withKryo[T](f: Kryo => T): T
}
class PooledKryoSupport(serializers: (Class[_], Serializer[_])*) extends KryoSupport {
override def withKryo[T](f: Kryo => T): T = {
pool.run(new KryoCallback[T] {
override def execute(kryo: Kryo): T = f(kryo)
})
}
private val pool = {
val factory = new KryoFactory() {
override def create(): Kryo = {
val kryo = new Kryo
(KryoSupport.ScalaSerializers ++ serializers).foreach {
case ((clazz, serializer)) =>
kryo.register(clazz, serializer)
}
kryo
}
}
new KryoPool.Builder(factory).softReferences().build()
}
}