I'm using a WebSocket end point exposed by my Play Framework controller. My client will however send a large byte array and I'm a bit confused on how to handle this in my Iteratee. Here is what I have:
def myWSEndPoint(f: String => String) = WebSocket.async[Array[Byte]] {
request =>
Akka.future {
val (out, chan) = Concurrent.broadcast[Array[Byte]]
val in: Iteratee[Array[Byte], Unit] = Iteratee.foreach[Array[Byte]] {
// How do I get the entire file?
}
(null, null)
}
}
As it can be seen in the code above, I'm stuck on the line on how to handle the Byte array as one request and send the response back as a String? My confusion is on the Iteratee.foreach call. Is this foreach a foreach on the byte array or the entire content of the request that I send as a byte array from my client? It is confusing!
Any suggestions?
Well... It depends. Is your client sending all binaries at once, or is it (explicitly) chunk by chunk?
-> If it's all at once, then everything will be in the first chunk (therefore why a websocket? Why an Iteratee? Actions with BodyParser will probably be more efficient for that).
-> If it's chunk by chunk you have to keep every chunks you receive, and concatenate them on close (on close, unless you have another way for the client to say: "Hey I'm done!").
Related
I would like to chain http request using akka-http-client as Stream. Each http request in a chain depends on a success/response of a previous requests and uses it to construct a new request. If a request is not successful, the Stream should return the response of the unsuccessful request.
How can I construct such a stream in akka-http?
which akka-http client level API should I use?
If you're making a web crawler, have a look at this post. This answer tackles a more simple case, such as downloading paginated resources, where the link to the next page is in a header of the current page response.
You can create a chained source - where one item leads to the next - using the Source.unfoldAsync method. This takes a function which takes an element S and returns Future[Option[(S, E)]] to determine if the stream should continue emitting elements of type E, passing the state to the next invocation.
In your case, this is kind of like:
taking an initial HttpRequest
producing a Future[HttpResponse]
if the response points to another URL, returning Some(request -> response), otherwise None
However, there's a wrinkle, which is that this will not emit a response from the stream if it doesn't contain a pointer to the next request.
To get around this, you can make the function passed to unfoldAsync return Future[Option[(Option[HttpRequest], HttpResponse)]]. This allows you to handle the following situations:
the current response is an error
the current response points to another request
the current response doesn't point to another request
What follows is some annotated code which outlines this approach, but first a preliminary:
When streaming HTTP requests to responses in Akka streams, you need to ensure that the response body is consumed otherwise bad things will happen (deadlocks and the like.) If you don't need the body you can ignore it, but here we use a function to convert the HttpEntity from a (potential) stream into a strict entity:
import scala.concurrent.duration._
def convertToStrict(r: HttpResponse): Future[HttpResponse] =
r.entity.toStrict(10.minutes).map(e => r.withEntity(e))
Next, a couple of functions to create an Option[HttpRequest] from an HttpResponse. This example uses a scheme like Github's pagination links, where the Links header contains, e.g: <https://api.github.com/...> rel="next":
def nextUri(r: HttpResponse): Seq[Uri] = for {
linkHeader <- r.header[Link].toSeq
value <- linkHeader.values
params <- value.params if params.key == "rel" && params.value() == "next"
} yield value.uri
def getNextRequest(r: HttpResponse): Option[HttpRequest] =
nextUri(r).headOption.map(next => HttpRequest(HttpMethods.GET, next))
Next, the real function we'll pass to unfoldAsync. It uses the Akka HTTP Http().singleRequest() API to take an HttpRequest and produce a Future[HttpResponse]:
def chainRequests(reqOption: Option[HttpRequest]): Future[Option[(Option[HttpRequest], HttpResponse)]] =
reqOption match {
case Some(req) => Http().singleRequest(req).flatMap { response =>
// handle the error case. Here we just return the errored response
// with no next item.
if (response.status.isFailure()) Future.successful(Some(None -> response))
// Otherwise, convert the response to a strict response by
// taking up the body and looking for a next request.
else convertToStrict(response).map { strictResponse =>
getNextRequest(strictResponse) match {
// If we have no next request, return Some containing an
// empty state, but the current value
case None => Some(None -> strictResponse)
// Otherwise, pass on the request...
case next => Some(next -> strictResponse)
}
}
}
// Finally, there's no next request, end the stream by
// returning none as the state.
case None => Future.successful(None)
}
Note that if we get an errored response, the stream will not continue since we return None in the next state.
You can invoke this to get a stream of HttpResponse objects like so:
val initialRequest = HttpRequest(HttpMethods.GET, "http://www.my-url.com")
Source.unfoldAsync[Option[HttpRequest], HttpResponse](
Some(initialRequest)(chainRequests)
As for returning the value of the last (or errored) response, you simply need to use Sink.last, since the stream will end either when it completes successfully or on the first errored response. For example:
def getStatus: Future[StatusCode] = Source.unfoldAsync[Option[HttpRequest], HttpResponse](
Some(initialRequest))(chainRequests)
.map(_.status)
.runWith(Sink.last)
I'm using Flink Stream for the handling of data traffic log in 3G network (GPRS Tunnelling Protocol). And I'm having trouble in the synthesis of information in a user session of the user.
For example: how to map the start and end one session. I don't know that there Flink streaming suited to handle complex protocols like that?
p/s:
We capture data exchanging between SGSN and GGSN in 3G network (use GTP protocol with GTP-C/U messages). A session is started when the SGSN sends the CreateReq (TEID, Seq, IMSI, TEID_dl,TEID_data_dl) message and GGSN responses CreateRsp(TEID_dl, Seq, TEID_ul, TEID_data_ul) message.
After the session is established, others GTP-C messages (ex: UpdateReq, DeleteReq) sent from SGSN to GGSN uses TEID_ul and response message uses TEID_dl, GTP- U message uses TEID_data_ul (SGSN -> GGSN) and TEID_data_dl (GGSN -> SGSN). GTP-U messages contain information such as AppID (facebook, twitter, web), url,...
Finally, I want to handle continuous log data stream and map the GTP-C messages and GTP-U of the same one user (IMSI) to make a report.
I've tried this:
val sessions = createReqs.connect(createRsps).flatMap(new CoFlatMapFunction[CreateReq, CreateRsp, Session] {
// holds CreateReqs indexed by (tedid_dl,seq)
private val createReqs = mutable.HashMap.empty[(String, String), CreateReq]
// holds CreateRsps indexed by (tedid,seq)
private val createRsps = mutable.HashMap.empty[(String, String), CreateRsp]
override def flatMap1(req: CreateReq, out: Collector[Session]): Unit = {
val key = (req.teid_dl, req.header.seqNum)
val oRsp = createRsps.get(key)
if (!oRsp.isEmpty) {
val rsp = oRsp.get
println("OK")
out.collect(new Session(rsp.header.time, req.imsi, req.teid_dl, req.teid_ddl, rsp.teid_upl, rsp.teid_dupl, req.rat, req.apn))
createRsps.remove(key)
} else {
createReqs.put(key, req)
}
}
override def flatMap2(rsp: CreateRsp, out: Collector[Session]): Unit = {
val key = (rsp.header.teid, rsp.header.seqNum)
val oReq = createReqs.get(key)
if (!oReq.isEmpty) {
val req = oReq.get
out.collect(new Session(rsp.header.time, req.imsi, req.teid_dl, req.teid_ddl, rsp.teid_upl, rsp.teid_dupl, req.rat, req.apn))
createReqs.remove(key)
} else {
createRsps.put(key, rsp)
}
}
}).print()
This code always returns empty result. The fact that the input stream contains CreateRsp and CreateReq message of the same session. They appear very close together (within 1 second). When I debug, the oReq.isEmpty == true every time.
What i'm doing wrong?
To be honest it is a bit difficult to see through the telco specifics here, but if I understand correctly you have at least 3 streams, the first two being the CreateReq and the CreateRsp streams.
To detect the establishment of a session I would use the ConnectedDataStream abstraction to share state between the two aforementioned streams. Check out this example for usage or the related Flink docs.
Is this what you are trying to achieve?
Ithink this used to be possible in Play 1.x, but I can't find how to do it in Play 2.x
I know that Play is asynchronous and uses Iteratees. However, there is generally much better support for InputStreams.
(In this case, I will be using a streaming JSON parser like Jackson to process the request body.)
How can I get an InputStream from a chunked request body?
I was able to get this working with the following code:
// I think all of the parens and braces line up -- copied/pasted from code
val pos = new PipedOutputStream()
val pis = new PipedInputStream(pos)
val result = Promise[Either[Errors, String]]()
def outputStreamBodyParser = {
BodyParser("outputStream") {
requestHeader =>
val length = requestHeader.headers(HeaderNames.CONTENT_LENGTH).toLong
Future {
result.completeWith(saveFile(pis, length)) // save returns Future[Either[Errors, String]]
}
Iteratee.fold[Array[Byte], OutputStream](pos) {
(os, data) =>
os.write(data)
os
}.map {
os =>
os.close()
Right(os)
}
}
}
Action.async(parse.when(
requestHeaders => {
val maybeContentLength = requestHeaders.headers.get(HeaderNames.CONTENT_LENGTH)
maybeContentLength.isDefined && allCatch.opt(maybeContentLength.get.toLong).isDefined
},
outputStreamBodyParser,
requestHeaders => Future.successful(BadRequest("Missing content-length header")))) {
request =>
result.future.map {
case Right(fileRef) => Ok(fileRef)
case Left(errors) => BadRequest(errors)
}
}
Play 2 is meant to be fully asynchronous, so this isn't easily possible or desirable. The problem with InputStream is there is no push back, there is no way for the reader of the InputStream to communicate to the input that it wants more data without blocking on read. Technically it is possible to write an Iteratee that could read data and put it into an InputStream, and would wait for a call to read on the InputStream before asking the Enumerator for more data, but it would be dangerous. You would have to make sure that the InputStream was closed properly, or the Enumerator would sit waiting forever (or until it times out) and the call to read must be made from a thread that is not running on the same ExecutionContext as the Enumerator and Iteratee or the application could deadlock.
In the simple case of proxying a download from S3 to the client, I'm having trouble dealing with client disconnections mid-download.
val enumerator = Enumerator.outputStream{ out =>
val s3Object = s3Client.getObject(new GetObjectRequest(bucket, path))
val in = new BufferedInputStream(s3Object.getObjectContent())
val bufferedOut = new BufferedOutputStream(out)
Iterator.continually(in.read).takeWhile(_ != -1).foreach(bufferedOut.write)
in.close()
bufferedOut.close()
}
Ok.chunked(enumerator.andThen(Enumerator.eof).withHeaders(
"Content-Disposition" -> s"attachment; filename=${name}"
)
This works (mostly) beautifully as long as the client doesn't cancel the download before it completes. Otherwise, on cancellation the enumerator keeps filling up with data until the download from S3 is complete. Several cancelled downloads can hog quite a bit of resources.
Are there any better patterns that can prevent this from happening?
Move closing the InputStream into an onDoneEnumerating block, and use Enumerator.fromStream instead of Enumerator.outputStream:
val s3Object = s3Client.getObject(new GetObjectRequest(bucket, path))
val in = s3Object.getObjectContent()
val enumerator = Enumerator.fromStream(in).onDoneEnumerating {
in.close()
}
fromStream reads chunks (8 KiB by default) at a time, so a BufferedInputStream is unnecessary.
Also, with outputStream, there is no way for the Iteratee to push back if it is slow to consume the data, which could result in a large amount of data (up to the size of the S3 object) buffering in memory. This is dangerous, because the application could run out of memory. With fromStream, the next read won't occur until the Iteratee is ready for more data.
In our project we are using ReactiveMongo with Play 2.2.1.
The problem is, that to stream of data in a form of Enumerator[A], returned by ReactiveMongo is actually a stream of value objects, which are not separated by commas, and do not have stream beginning and end annotations, which can be treated as array opening and close statements.
This creates a problem for JSON consumer JS client, since the expected format is
[A1,A2, ...]
So we jumped in hoops, and transformed our Enumeratee[A] to Enumerator[String], with checking, if it's the first element, or not:
var first:Boolean = true
val aToStrs = (as.map(a => {
if(first) {
first = false;
Json.stringify(Json.toJson(a))
} else {
"," + Json.stringify(Json.toJson(a))
}
}))
Ok.chunked(
Enumerator.enumInput(Input.El("[")) andThen
aToStrs andThen
Enumerator.enumInput(Input.El("]")) andThen
Enumerator.enumInput(Input.EOF)
)
This works, but feels like inventing the wheel.
Is there a better solution, for this common problem?
If you use comet or EventSource you wont have to hand craft a way to generate output and you will also actually be able to parse the response item for item in the client. Responding with an array would force you to either write your own parsing code or wait until everything has arrived on the client side before you can use the build int JSON parser in your JavaScript.
Streaming with the EventSource protocol is pretty easy with play, you should be able to do something like:
implicit val cometEncoder = new Comet.CometMessage[JsValue](_.toString)
Ok.chunked(yourEnumerator &> EventSource()).as(EVENT_STREAM)
And then in the client html:
<script type="text/javascript">
var es = new EventSource(jsRouter.controllers.Application.streamIt().url)
es.addEventListener("message", function (event) {
var item = JSON.parse(event.data)
// ... do something with the json value ...
})
</script>
There is an example of this in the play sample projects that you might want to look at as well $YOUR_PLAY_DIR/samples/scala/eventsource-clock/