How to get InputStream from request in Play - scala

Ithink this used to be possible in Play 1.x, but I can't find how to do it in Play 2.x
I know that Play is asynchronous and uses Iteratees. However, there is generally much better support for InputStreams.
(In this case, I will be using a streaming JSON parser like Jackson to process the request body.)
How can I get an InputStream from a chunked request body?

I was able to get this working with the following code:
// I think all of the parens and braces line up -- copied/pasted from code
val pos = new PipedOutputStream()
val pis = new PipedInputStream(pos)
val result = Promise[Either[Errors, String]]()
def outputStreamBodyParser = {
BodyParser("outputStream") {
requestHeader =>
val length = requestHeader.headers(HeaderNames.CONTENT_LENGTH).toLong
Future {
result.completeWith(saveFile(pis, length)) // save returns Future[Either[Errors, String]]
}
Iteratee.fold[Array[Byte], OutputStream](pos) {
(os, data) =>
os.write(data)
os
}.map {
os =>
os.close()
Right(os)
}
}
}
Action.async(parse.when(
requestHeaders => {
val maybeContentLength = requestHeaders.headers.get(HeaderNames.CONTENT_LENGTH)
maybeContentLength.isDefined && allCatch.opt(maybeContentLength.get.toLong).isDefined
},
outputStreamBodyParser,
requestHeaders => Future.successful(BadRequest("Missing content-length header")))) {
request =>
result.future.map {
case Right(fileRef) => Ok(fileRef)
case Left(errors) => BadRequest(errors)
}
}

Play 2 is meant to be fully asynchronous, so this isn't easily possible or desirable. The problem with InputStream is there is no push back, there is no way for the reader of the InputStream to communicate to the input that it wants more data without blocking on read. Technically it is possible to write an Iteratee that could read data and put it into an InputStream, and would wait for a call to read on the InputStream before asking the Enumerator for more data, but it would be dangerous. You would have to make sure that the InputStream was closed properly, or the Enumerator would sit waiting forever (or until it times out) and the call to read must be made from a thread that is not running on the same ExecutionContext as the Enumerator and Iteratee or the application could deadlock.

Related

Wrapping Pub-Sub Java API in Akka Streams Custom Graph Stage

I am working with a Java API from a data vendor providing real time streams. I would like to process this stream using Akka streams.
The Java API has a pub sub design and roughly works like this:
Subscription sub = createSubscription();
sub.addListener(new Listener() {
public void eventsReceived(List events) {
for (Event e : events)
buffer.enqueue(e)
}
});
I have tried to embed the creation of this subscription and accompanying buffer in a custom graph stage without much success. Can anyone guide me on the best way to interface with this API using Akka? Is Akka Streams the best tool here?
To feed a Source, you don't necessarily need to use a custom graph stage. Source.queue will materialize as a buffered queue to which you can add elements which will then propagate through the stream.
There are a couple of tricky things to be aware of. The first is that there's some subtlety around materializing the Source.queue so you can set up the subscription. Something like this:
def bufferSize: Int = ???
Source.fromMaterializer { (mat, att) =>
val (queue, source) = Source.queue[Event](bufferSize).preMaterialize()(mat)
val subscription = createSubscription()
subscription.addListener(
new Listener() {
def eventsReceived(events: java.util.List[Event]): Unit = {
import scala.collection.JavaConverters.iterableAsScalaIterable
import akka.stream.QueueOfferResult._
iterableAsScalaIterable(events).foreach { event =>
queue.offer(event) match {
case Enqueued => () // do nothing
case Dropped => ??? // handle a dropped pubsub element, might well do nothing
case QueueClosed => ??? // presumably cancel the subscription...
}
}
}
}
)
source.withAttributes(att)
}
Source.fromMaterializer is used to get access at each materialization to the materializer (which is what compiles the stream definition into actors). When we materialize, we use the materializer to preMaterialize the queue source so we have access to the queue. Our subscription adds incoming elements to the queue.
The API for this pubsub doesn't seem to support backpressure if the consumer can't keep up. The queue will drop elements it's been handed if the buffer is full: you'll probably want to do nothing in that case, but I've called it out in the match that you should make an explicit decision here.
Dropping the newest element is the synchronous behavior for this queue (there are other queue implementations available, but those will communicate dropping asynchronously which can be really bad for memory consumption in a burst). If you'd prefer something else, it may make sense to have a very small buffer in the queue and attach the "overall" Source (the one returned by Source.fromMaterializer) to a stage which signals perpetual demand. For example, a buffer(downstreamBufferSize, OverflowStrategy.dropHead) will drop the oldest event not yet processed. Alternatively, it may be possible to combine your Events in some meaningful way, in which case a conflate stage will automatically combine incoming Events if the downstream can't process them quickly.
Great answer! I did build something similar. There are also kamon metrics to monitor queue size exc.
class AsyncSubscriber(projectId: String, subscriptionId: String, metricsRegistry: CustomMetricsRegistry, pullParallelism: Int)(implicit val ec: Executor) {
private val logger = LoggerFactory.getLogger(getClass)
def bufferSize: Int = 1000
def source(): Source[(PubsubMessage, AckReplyConsumer), Future[NotUsed]] = {
Source.fromMaterializer { (mat, attr) =>
val (queue, source) = Source.queue[(PubsubMessage, AckReplyConsumer)](bufferSize).preMaterialize()(mat)
val receiver: MessageReceiver = {
(message: PubsubMessage, consumer: AckReplyConsumer) => {
metricsRegistry.inputEventQueueSize.update(queue.size())
queue.offer((message, consumer)) match {
case QueueOfferResult.Enqueued =>
metricsRegistry.inputQueueAddEventCounter.increment()
case QueueOfferResult.Dropped =>
metricsRegistry.inputQueueDropEventCounter.increment()
consumer.nack()
logger.warn(s"Buffer is full, message nacked. Pubsub should retry don't panic. If this happens too often, we should also tweak the buffer size or the autoscaler.")
case QueueOfferResult.Failure(ex) =>
metricsRegistry.inputQueueDropEventCounter.increment()
consumer.nack()
logger.error(s"Failed to offer message with id=${message.getMessageId()}", ex)
case QueueOfferResult.QueueClosed =>
logger.error("Destination Queue closed. Something went terribly wrong. Shutting down the jvm.")
consumer.nack()
mat.shutdown()
sys.exit(1)
}
}
}
val subscriptionName = ProjectSubscriptionName.of(projectId, subscriptionId)
val subscriber = Subscriber.newBuilder(subscriptionName, receiver).setParallelPullCount(pullParallelism).build
subscriber.startAsync().awaitRunning()
source.withAttributes(attr)
}
}
}

NullPointerException in Flink custom SourceFunction

I wanted to create a SourceFunction which reads a http stream.
I used ScalaJ which does what I want (it splits the incoming text by \n-s).
Obviously the code works outside Flink, but I get a NullPointerExcetion every time I start it as a Flink job (sometimes immediately sometimes after 1-2 seconds after it transmitted 1-2 elements). It kind of looks like the Http object has some problems.
import org.apache.flink.streaming.api.functions.source.SourceFunction
import scala.io.Source.fromInputStream
import scalaj.http._
class HttpSource(url: String) extends SourceFunction[String] {
#volatile var isRunning = true
override def cancel(): Unit = isRunning = false
override def run(ctx: SourceFunction.SourceContext[String]): Unit =
httpStream(ctx.collect)
private def httpStream(f: String => Unit) = {
val request = Http(url)
request
.execute { inputStream =>
fromInputStream(inputStream)
.getLines()
.takeWhile(_ => isRunning)
.foreach(f)
}
}
}
Here's the exception I usually get:
(Sometimes it's a bit different, for example I tried to make the request value transient, then it's already null when it tries to refer to request)
Caused by: java.lang.NullPointerException
at java.io.Reader.<init>(Reader.java:78)
at java.io.InputStreamReader.<init>(InputStreamReader.java:129)
at scala.io.BufferedSource.reader(BufferedSource.scala:24)
at scala.io.BufferedSource.bufferedReader(BufferedSource.scala:25)
at scala.io.BufferedSource.scala$io$BufferedSource$$charReader$lzycompute(BufferedSource.scala:35)
at scala.io.BufferedSource.scala$io$BufferedSource$$charReader(BufferedSource.scala:33)
at scala.io.BufferedSource.scala$io$BufferedSource$$decachedReader(BufferedSource.scala:62)
at scala.io.BufferedSource$BufferedLineIterator.<init>(BufferedSource.scala:67)
at scala.io.BufferedSource.getLines(BufferedSource.scala:86)
at flinkextension.HttpSource$$anonfun$httpStream$1.apply(HttpSource.scala:21)
at flinkextension.HttpSource$$anonfun$httpStream$1.apply(HttpSource.scala:19)
at scalaj.http.HttpRequest$$anonfun$execute$1.apply(Http.scala:323)
at scalaj.http.HttpRequest$$anonfun$execute$1.apply(Http.scala:323)
at scalaj.http.HttpRequest$$anonfun$toResponse$3.apply(Http.scala:388)
at scalaj.http.HttpRequest$$anonfun$toResponse$3.apply(Http.scala:380)
at scala.Option.getOrElse(Option.scala:121)
at scalaj.http.HttpRequest.toResponse(Http.scala:380)
at scalaj.http.HttpRequest.scalaj$http$HttpRequest$$doConnection(Http.scala:360)
at scalaj.http.HttpRequest.exec(Http.scala:335)
at scalaj.http.HttpRequest.execute(Http.scala:323)
at flinkextension.HttpSource.httpStream(HttpSource.scala:19)
at flinkextension.HttpSource.run(HttpSource.scala:14)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:55)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:95)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:263)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:702)
at java.lang.Thread.run(Thread.java:748)
Everything else seems to be working fine, when I don't use a http request, but something else like file read with the same InputStream type, just a plain while loop with strings or even when I use single http requests, which aren't streaming.
I feel like I'm missing some theoretical background, maybe flink does something in the background which destroys the Http object or the InputStream, but I didn't find anything in the documentation.
UPDATE #1:
If I put a null check into the lambda, the job usually exits immediately, sometimes processes a few elements, sometimes timeouts after hanging for a minute. Here's this version of the httpStream function:
private def httpStream(f: String => Unit) = {
val request = Http(url)
request
.execute { inputStream =>
if (inputStream == null) println("null inputstream")
else {
println("not null inputstream")
fromInputStream(inputStream)
.getLines()
.takeWhile(_ => isRunning)
.foreach(f)
}
}
}
UPDATE #2:
The code actually works in distributed mode and with StreamExecutionEnvironment.createLocalEnvironment()
I only experience the issue if I use start-local.sh and submit the jar to it.

Terminate Akka-Http Web Socket connection asynchronously

Web Socket connections in Akka Http are treated as an Akka Streams Flow. This seems like it works great for basic request-reply, but it gets more complex when messages should also be pushed out over the websocket. The core of my server looks kind of like:
lazy val authSuccessMessage = Source.fromFuture(someApiCall)
lazy val messageFlow = requestResponseFlow
.merge(updateBroadcastEventSource)
lazy val handler = codec
.atop(authGate(authSuccessMessage))
.join(messageFlow)
handleWebSocketMessages {
handler
}
Here, codec is a (de)serialization BidiFlow and authGate is a BidiFlow that processes an authorization message and prevents outflow of any messages until authorization succeeds. Upon success, it sends authSuccessMessage as a reply. requestResponseFlow is the standard request-reply pattern, and updateBroadcastEventSource mixes in async push messages.
I want to be able to send an error message and terminate the connection gracefully in certain situations, such as bad authorization, someApiCall failing, or a bad request processed by requestResponseFlow. So basically, basically it seems like I want to be able to asynchronously complete messageFlow with one final message, even though its other constituent flows are still alive.
Figured out how to do this using a KillSwitch.
Updated version
The old version had the problem that it didn't seem to work when triggered by a BidiFlow stage higher up in the stack (such as my authGate). I'm not sure exactly why, but modeling the shutoff as a BidiFlow itself, placed further up the stack, resolved the issue.
val shutoffPromise = Promise[Option[OutgoingWebsocketEvent]]()
/**
* Shutoff valve for the connection. It is triggered when `shutoffPromise`
* completes, and sends a final optional termination message if that
* promise resolves with one.
*/
val shutoffBidi = {
val terminationMessageSource = Source
.maybe[OutgoingWebsocketEvent]
.mapMaterializedValue(_.completeWith(shutoffPromise.future))
val terminationMessageBidi = BidiFlow.fromFlows(
Flow[IncomingWebsocketEventOrAuthorize],
Flow[OutgoingWebsocketEvent].merge(terminationMessageSource)
)
val terminator = BidiFlow
.fromGraph(KillSwitches.singleBidi[IncomingWebsocketEventOrAuthorize, OutgoingWebsocketEvent])
.mapMaterializedValue { killSwitch =>
shutoffPromise.future.foreach { _ => println("Shutting down connection"); killSwitch.shutdown() }
}
terminationMessageBidi.atop(terminator)
}
Then I apply it just inside the codec:
val handler = codec
.atop(shutoffBidi)
.atop(authGate(authSuccessMessage))
.join(messageFlow)
Old version
val shutoffPromise = Promise[Option[OutgoingWebsocketEvent]]()
/**
* Shutoff valve for the flow of outgoing messages. It is triggered when
* `shutoffPromise` completes, and sends a final optional termination
* message if that promise resolves with one.
*/
val shutoffFlow = {
val terminationMessageSource = Source
.maybe[OutgoingWebsocketEvent]
.mapMaterializedValue(_.completeWith(shutoffPromise.future))
Flow
.fromGraph(KillSwitches.single[OutgoingWebsocketEvent])
.mapMaterializedValue { killSwitch =>
shutoffPromise.future.foreach(_ => killSwitch.shutdown())
}
.merge(terminationMessageSource)
}
Then handler looks like:
val handler = codec
.atop(authGate(authSuccessMessage))
.join(messageFlow via shutoffFlow)

Consuming a service using WS in Play

I was hoping someone can briefly go over the various ways of consuming a service (this one just returns a string, normally it would be JSON but I just want to understand the concepts here).
My service:
def ping = Action {
Ok("pong")
}
Now in my Play (2.3.x) application, I want to call my client and display the response.
When working with Futures, I want to display the value.
I am a bit confused, what are all the ways I could call this method i.e. there are some ways I see that use Success/Failure,
val futureResponse: Future[String] = WS.url(url + "/ping").get().map { response =>
response.body
}
var resp = ""
futureResponse.onComplete {
case Success(str) => {
Logger.trace(s"future success $str")
resp = str
}
case Failure(ex) => {
Logger.trace(s"future failed")
resp = ex.toString
}
}
Ok(resp)
I can see the trace in STDOUT for success/failure, but my controller action just returns "" to my browser.
I understand that this is because it returns a FUTURE and my action finishes before the future returns.
How can I force it to wait?
What options do I have with error handling?
If you really want to block until feature is completed look at the Future.ready() and Future.result() methods. But you shouldn't.
The point about Future is that you can tell it how to use the result once it arrived, and then go on, no blocks required.
Future can be the result of an Action, in this case framework takes care of it:
def index = Action.async {
WS.url(url + "/ping").get()
.map(response => Ok("Got result: " + response.body))
}
Look at the documentation, it describes the topic very well.
As for the error-handling, you can use Future.recover() method. You should tell it what to return in case of error and it gives you new Future that you should return from your action.
def index = Action.async {
WS.url(url + "/ping").get()
.map(response => Ok("Got result: " + response.body))
.recover{ case e: Exception => InternalServerError(e.getMessage) }
}
So the basic way you consume service is to get result Future, transform it in the way you want by using monadic methods(the methods that return new transformed Future, like map, recover, etc..) and return it as a result of an Action.
You may want to look at Play 2.2 -Scala - How to chain Futures in Controller Action and Dealing with failed futures questions.

Play Framework WebSocket Async

I'm using a WebSocket end point exposed by my Play Framework controller. My client will however send a large byte array and I'm a bit confused on how to handle this in my Iteratee. Here is what I have:
def myWSEndPoint(f: String => String) = WebSocket.async[Array[Byte]] {
request =>
Akka.future {
val (out, chan) = Concurrent.broadcast[Array[Byte]]
val in: Iteratee[Array[Byte], Unit] = Iteratee.foreach[Array[Byte]] {
// How do I get the entire file?
}
(null, null)
}
}
As it can be seen in the code above, I'm stuck on the line on how to handle the Byte array as one request and send the response back as a String? My confusion is on the Iteratee.foreach call. Is this foreach a foreach on the byte array or the entire content of the request that I send as a byte array from my client? It is confusing!
Any suggestions?
Well... It depends. Is your client sending all binaries at once, or is it (explicitly) chunk by chunk?
-> If it's all at once, then everything will be in the first chunk (therefore why a websocket? Why an Iteratee? Actions with BodyParser will probably be more efficient for that).
-> If it's chunk by chunk you have to keep every chunks you receive, and concatenate them on close (on close, unless you have another way for the client to say: "Hey I'm done!").