Akka-http - write to response output stream - scala

I need to create a large tab separated file as a response to HTTP GET request.
In my route a create some Scala objects and then I want to write some custom representation of those objects to the Output Stream.
It is not just serialization to tab separated instead of JSON, because I need also to create a header with column names, so IMHO this can't be solved with custom marshaling.
So how can I get a writer or outputstream from HttpRequest?
Something like
~path("export") {
get {
val sampleExonRPKMs = exonRPKMService.getRPKMs(samples)
val writer = HttpResponse().getWriter // this does not exists
writeHeader(writer)
... // write objects tab separated
}
}

You can complete an Akka HTTP route with a marshallable source. If you don't want to use custom marshallers, you can always complete with a Source[ByteString, _]. See the docs for more info.
Your route will look something like
get {
val sampleExonRPKMs = exonRPKMService.getRPKMs(samples)
val headers: String = ???
Source.single(headers).concat(Source(sampleExonRPKMs).map(_.toTSVLine)).intersperse("\n").map(ByteString.apply)
}
Note as a separate issue: if you are dealing with large amounts of data, the getRPKMs call will result in loading all of it in memory.

Related

How to use Flink streaming to process Data stream of Complex Protocols

I'm using Flink Stream for the handling of data traffic log in 3G network (GPRS Tunnelling Protocol). And I'm having trouble in the synthesis of information in a user session of the user.
For example: how to map the start and end one session. I don't know that there Flink streaming suited to handle complex protocols like that?
p/s:
We capture data exchanging between SGSN and GGSN in 3G network (use GTP protocol with GTP-C/U messages). A session is started when the SGSN sends the CreateReq (TEID, Seq, IMSI, TEID_dl,TEID_data_dl) message and GGSN responses CreateRsp(TEID_dl, Seq, TEID_ul, TEID_data_ul) message.
After the session is established, others GTP-C messages (ex: UpdateReq, DeleteReq) sent from SGSN to GGSN uses TEID_ul and response message uses TEID_dl, GTP- U message uses TEID_data_ul (SGSN -> GGSN) and TEID_data_dl (GGSN -> SGSN). GTP-U messages contain information such as AppID (facebook, twitter, web), url,...
Finally, I want to handle continuous log data stream and map the GTP-C messages and GTP-U of the same one user (IMSI) to make a report.
I've tried this:
val sessions = createReqs.connect(createRsps).flatMap(new CoFlatMapFunction[CreateReq, CreateRsp, Session] {
// holds CreateReqs indexed by (tedid_dl,seq)
private val createReqs = mutable.HashMap.empty[(String, String), CreateReq]
// holds CreateRsps indexed by (tedid,seq)
private val createRsps = mutable.HashMap.empty[(String, String), CreateRsp]
override def flatMap1(req: CreateReq, out: Collector[Session]): Unit = {
val key = (req.teid_dl, req.header.seqNum)
val oRsp = createRsps.get(key)
if (!oRsp.isEmpty) {
val rsp = oRsp.get
println("OK")
out.collect(new Session(rsp.header.time, req.imsi, req.teid_dl, req.teid_ddl, rsp.teid_upl, rsp.teid_dupl, req.rat, req.apn))
createRsps.remove(key)
} else {
createReqs.put(key, req)
}
}
override def flatMap2(rsp: CreateRsp, out: Collector[Session]): Unit = {
val key = (rsp.header.teid, rsp.header.seqNum)
val oReq = createReqs.get(key)
if (!oReq.isEmpty) {
val req = oReq.get
out.collect(new Session(rsp.header.time, req.imsi, req.teid_dl, req.teid_ddl, rsp.teid_upl, rsp.teid_dupl, req.rat, req.apn))
createReqs.remove(key)
} else {
createRsps.put(key, rsp)
}
}
}).print()
This code always returns empty result. The fact that the input stream contains CreateRsp and CreateReq message of the same session. They appear very close together (within 1 second). When I debug, the oReq.isEmpty == true every time.
What i'm doing wrong?
To be honest it is a bit difficult to see through the telco specifics here, but if I understand correctly you have at least 3 streams, the first two being the CreateReq and the CreateRsp streams.
To detect the establishment of a session I would use the ConnectedDataStream abstraction to share state between the two aforementioned streams. Check out this example for usage or the related Flink docs.
Is this what you are trying to achieve?

How to properly use spray.io LruCache

I am quite an unexperienced spray/scala developer, I am trying to properly use spray.io LruCache. I am trying to achieve something very simple. I have a kafka consumer, when it reads something from its topic I want it to put the value it reads to cache.
Then in one of the routings I want to read this value, the value is of type string, what I have at the moment looks as follows:
object MyCache {
val cache: Cache[String] = LruCache(
maxCapacity = 10000,
initialCapacity = 100,
timeToLive = Duration.Inf,
timeToIdle = Duration(24, TimeUnit.HOURS)
)
}
to put something into cache i use following code:
def message() = Future { new String(singleMessage.message()) }
MyCache.cache(key, message)
Then in one of the routings I am trying to get something from the cache:
val res = MyCache.cache.get(keyHash)
The problem is the type of res is Option[Future[String]], it is quite hard and ugly to access the real value in this case. Could someone please tell me how I can simplify my code to make it better and more readable ?
Thanks in advance.
Don't try to get the value out of the Future. Instead call map on the Future to arrange for work to be done on the value when the Future is completed, and then complete the request with that result (which is itself a Future). It should look something like this:
path("foo") {
complete(MyCache.cache.get(keyHash) map (optMsg => ...))
}
Also, if singleMessage.message does not do I/O or otherwise block, then rather than creating the Future like you are
Future { new String(singleMessage.message) }
it would be more efficient to do it like so:
Future.successful(new String(singleMessage.message))
The latter just creates an already completed Future, bypassing the use of an ExecutionContext to evaluate the function.
If singleMessage.message does do I/O, then ideally you would do that I/O with some library (like Spray client, if it's an HTTP request) that returns a Future (rather than using Future { ... } to create another thread which will block).

Play Framework WebSocket Async

I'm using a WebSocket end point exposed by my Play Framework controller. My client will however send a large byte array and I'm a bit confused on how to handle this in my Iteratee. Here is what I have:
def myWSEndPoint(f: String => String) = WebSocket.async[Array[Byte]] {
request =>
Akka.future {
val (out, chan) = Concurrent.broadcast[Array[Byte]]
val in: Iteratee[Array[Byte], Unit] = Iteratee.foreach[Array[Byte]] {
// How do I get the entire file?
}
(null, null)
}
}
As it can be seen in the code above, I'm stuck on the line on how to handle the Byte array as one request and send the response back as a String? My confusion is on the Iteratee.foreach call. Is this foreach a foreach on the byte array or the entire content of the request that I send as a byte array from my client? It is confusing!
Any suggestions?
Well... It depends. Is your client sending all binaries at once, or is it (explicitly) chunk by chunk?
-> If it's all at once, then everything will be in the first chunk (therefore why a websocket? Why an Iteratee? Actions with BodyParser will probably be more efficient for that).
-> If it's chunk by chunk you have to keep every chunks you receive, and concatenate them on close (on close, unless you have another way for the client to say: "Hey I'm done!").

How to return Enumerator of JSON from ReactiveMongo in Play 2

In our project we are using ReactiveMongo with Play 2.2.1.
The problem is, that to stream of data in a form of Enumerator[A], returned by ReactiveMongo is actually a stream of value objects, which are not separated by commas, and do not have stream beginning and end annotations, which can be treated as array opening and close statements.
This creates a problem for JSON consumer JS client, since the expected format is
[A1,A2, ...]
So we jumped in hoops, and transformed our Enumeratee[A] to Enumerator[String], with checking, if it's the first element, or not:
var first:Boolean = true
val aToStrs = (as.map(a => {
if(first) {
first = false;
Json.stringify(Json.toJson(a))
} else {
"," + Json.stringify(Json.toJson(a))
}
}))
Ok.chunked(
Enumerator.enumInput(Input.El("[")) andThen
aToStrs andThen
Enumerator.enumInput(Input.El("]")) andThen
Enumerator.enumInput(Input.EOF)
)
This works, but feels like inventing the wheel.
Is there a better solution, for this common problem?
If you use comet or EventSource you wont have to hand craft a way to generate output and you will also actually be able to parse the response item for item in the client. Responding with an array would force you to either write your own parsing code or wait until everything has arrived on the client side before you can use the build int JSON parser in your JavaScript.
Streaming with the EventSource protocol is pretty easy with play, you should be able to do something like:
implicit val cometEncoder = new Comet.CometMessage[JsValue](_.toString)
Ok.chunked(yourEnumerator &> EventSource()).as(EVENT_STREAM)
And then in the client html:
<script type="text/javascript">
var es = new EventSource(jsRouter.controllers.Application.streamIt().url)
es.addEventListener("message", function (event) {
var item = JSON.parse(event.data)
// ... do something with the json value ...
})
</script>
There is an example of this in the play sample projects that you might want to look at as well $YOUR_PLAY_DIR/samples/scala/eventsource-clock/

Using Streams in Gatling repeat blocks

I've come across the following code in a Gatling scenario (modified for brevity/privacy):
val scn = scenario("X")
.repeat(numberOfLoops, "loopName") {
exec((session : Session) => {
val loopCounter = session.getTypedAttribute[Int]("loopName")
session.setAttribute("xmlInput", createXml(loopCounter))
})
.exec(
http("X")
.post("/rest/url")
.headers(headers)
.body("${xmlInput}"))
)
}
It's naming the loop in the repeat block, getting that out of the session and using it to create a unique input XML. It then sticks that XML back into the session and extracts it again when posting it.
I would like to do away with the need to name the loop iterator and accessing the session.
Ideally I'd like to use a Stream to generate the XML.
But Gatling controls the looping and I can't recurse. Do I need to compromise, or can I use Gatling in a functional way (without vars or accessing the session)?
As I see it, neither numberOfLoops nor createXml seem to depend on anything user related that would have been stored in the session, so the loop could be resolved at build time, not at runtime.
import com.excilys.ebi.gatling.core.structure.ChainBuilder
def addXmlPost(chain: ChainBuilder, i: Int) =
chain.exec(
http("X")
.post("/rest/url")
.headers(headers)
.body(createXml(i))
)
def addXmlPostLoop(chain: ChainBuilder): ChainBuilder =
(0 until numberOfLoops).foldLeft(chain)(addXmlPost)
Cheers,
Stéphane
PS: The preferred way to ask something about Gatling is our Google Group: https://groups.google.com/forum/#!forum/gatling