Efficient JSON-toJSON transformations with spray-json - scala

I have a scenario similar to this: an Akka HTTP service calls another service and performs some transformations on its JSON response. Let's say it replaces "http" with "https" on every "link" attribute's value.
Right now the implementation is something like:
def route: Route =
callToAnotherService(request) { eventualJsonResponse =>
complete(
eventualJsonResponse.flatMap(
(jsonResponse: HttpResponse) => {
Unmarshal(jsonResponse.entity.withContentType(MediaTypes.`application/json`))
.to[JsValue]
.map(replaceHttpInLinks)
.flatMap(Marshal(_).to[ResponseEntity])
.map(responseEntity => jsonResponse.copy(entity = responseEntity)))
}
)
)
}
The transformation method has the following signature:
def replaceHttpInLinks(jsValue: JsValue): JsValue = {
// Recursively find "link" attributes and replace protocol
}
As you can see, the called service's JSON response is unmarshalled into a JsValue object and then this object is used to perform the changes.
That response can be huge, and I'm concerned about both performance and memory consumption.
I was looking for a way of making those changes without unmarshalling the whole JSON document, and hopefully without introducing foreign libraries (Play JSON or others). I was thinking of something event based, along the lines of the old SAX API for XML.
Does anyone come up with any idea to achieve it?

I think that with Spray is more complicated because it will try to build the JsValue from the body of the HTTPRequest. My suggestion is to use Circe and use HCursor to unmarshall manually. Take a look to some exampl here.
You can integrate circe with Akka: https://github.com/hseeberger/akka-http-json

Related

What is an entity in Akka-Http?

I am new to akka-http and building a basic server-client application in scala. The examples I looked at has the object "entity". Can someone please explain the concept underlying and why is it used and how is it useful?
post {
path("insert") {
entity(as[Student]) {
obj => complete {
insertingstudent(obj)
s"got obj with name ${obj.getName()}"
}
}
Thanks
Can someone please explain the concept underlying and why is it used
and how is it useful?
entity is of type HttpEntity. From the comments of the code:
Models the entity (aka "body" or "content") of an HTTP message.
It is an abstraction over the content of the HTTP request. Many times, when one sends an HTTP request, they provide a payload inside the body of the request. This body can be in many format, popular ones are JSON and XML.
When you write:
entity(as[Student])
You are attempting to unmarhsall, or deserialize, the body of the request into a data structure of your liking. That means that your obj field in the proceeding function will be of type Student.

How can I perform session based logging in Play Framework

We are currently using the Play Framework and we are using the standard logging mechanism. We have implemented a implicit context to support passing username and session id to all service methods. We want to implement logging so that it is session based. This requires implementing our own logger. This works for our own logs but how do we do the same for basic exception handling and logs as a result. Maybe there is a better way to capture this then with implicits or how can we override the exception handling logging. Essentially, we want to get as many log messages to be associated to the session.
It depends if you are doing reactive style development or standard synchronous development:
If standard synchronous development (i.e. no futures, 1 thread per request) - then I'd recommend you just use MDC, which adds values onto Threadlocal for logging. You can then customise the output in logback / log4j. When you get the username / session (possibly in a Filter or in your controller), you can then set the values there and then and you do not need to pass them around with implicits.
If you are doing reactive development you have a couple options:
You can still use MDC, except you'd have to use a custom Execution Context that effectively copies the MDC values to the thread, since each request could in theory be handled by multiple threads. (as described here: http://code.hootsuite.com/logging-contextual-info-in-an-asynchronous-scala-application/)
The alternative is the solution which I tend to use (and close to what you have now): You could make a class which represents MyAppRequest. Set the username, session info, and anything else, on that. You can continue to pass it around as an implicit. However, instead of using Action.async, you make your own MyAction class which an be used like below
myAction.async { implicit myRequest => //some code }
Inside the myAction, you'd have to catch all Exceptions and deal with future failures, and do the error handling manually instead of relying on the ErrorHandler. I often inject myAction into my Controllers and put common filter functionality in it.
The down side of this is, it is just a manual method. Also I've made MyAppRequest hold a Map of loggable values which can be set anywhere, which means it had to be a mutable map. Also, sometimes you need to make more than one myAction.async. The pro is, it is quite explicit and in your control without too much ExecutionContext/ThreadLocal magic.
Here is some very rough sample code as a starter, for the manual solution:
def logErrorAndRethrow(myrequest:MyRequest, x:Throwable): Nothing = {
//log your error here in the format you like
throw x //you can do this or handle errors how you like
}
class MyRequest {
val attr : mutable.Map[String, String] = new mutable.HashMap[String, String]()
}
//make this a util to inject, or move it into a common parent controller
def myAsync(block: MyRequest => Future[Result] ): Action[AnyContent] = {
val myRequest = new MyRequest()
try {
Action.async(
block(myRequest).recover { case cause => logErrorAndRethrow(myRequest, cause) }
)
} catch {
case x:Throwable =>
logErrorAndRethrow(myRequest, x)
}
}
//the method your Route file refers to
def getStuff = myAsync { request:MyRequest =>
//execute your code here, passing around request as an implicit
Future.successful(Results.Ok)
}

How can Autowire library (Scalajs) perform more than one HTTP method type in doCall?

Looking at the examples and reading through documentations of lihaoyi's Autowire library for ScalaJs, I can't see a way that one autowire.Client can perform both GET and POST as it only allows overriding the doCall:
def doCall(req: Request): Future[PickleType]
And request is simply just:
case class Request[PickleType](path : Seq[String], args: Map[String, PickleType])
Is there a nice solution (other than having multiple autowire.Client instances) to this limitation so that my api can follow the GET for getting, POST and UPDATE for creating and updating (REST convention)?
It depends on your exact requirements, but keep in mind that inside doCall you can potentially do different calls depending on the circumstances. For example, in Querki my doCall implementation looks like this:
override def doCall(req: Request): Future[String] = {
try {
if (DataAccess.space.isEmpty) {
makeCall(req, controllers.ClientController.rawApiRequest())
} else {
makeCall(req, controllers.ClientController.apiRequest(
DataAccess.userName,
DataAccess.spaceId.underlying))
}
} catch {
...
}
}
This is choosing the API to call based on external runtime information, but you could introspect the Request, and decide what API to call that way. I suspect that's what you are looking for.
It does require some external information telling you which Request corresponds to which method, but I don't see much way around that -- Autowire isn't inherently about HTTP (and it's pretty common for all Autowire calls to go to a single HTTP entry point), so you're doing something rather out of the ordinary here. If you're using Play, I believe you can get the method information from the JavascriptReverseRouter; if not, you should see whether your HTTP implementation provides any sort of introspection that you can use...

My http request becomes null inside an Akka future

My server application uses Scalatra, with json4s, and Akka.
Most of the requests it receives are POSTs, and they return immediately to the client with a fixed response. The actual responses are sent asynchronously to a server socket at the client. To do this, I need to getRemoteAddr from the http request. I am trying with the following code:
case class MyJsonParams(foo:String, bar:Int)
class MyServices extends ScalatraServlet {
implicit val formats = DefaultFormats
post("/test") {
withJsonFuture[MyJsonParams]{ params =>
// code that calls request.getRemoteAddr goes here
// sometimes request is null and I get an exception
println(request)
}
}
def withJsonFuture[A](closure: A => Unit)(implicit mf: Manifest[A]) = {
contentType = "text/json"
val params:A = parse(request.body).extract[A]
future{
closure(params)
}
Ok("""{"result":"OK"}""")
}
}
The intention of the withJsonFuture function is to move some boilerplate out of my route processing.
This sometimes works (prints a non-null value for request) and sometimes request is null, which I find quite puzzling. I suspect that I must be "closing over" the request in my future. However, the error also happens with controlled test scenarios when there are no other requests going on. I would imagine request to be immutable (maybe I'm wrong?)
In an attempt to solve the issue, I have changed my code to the following:
case class MyJsonParams(foo:String, bar:Int)
class MyServices extends ScalatraServlet {
implicit val formats = DefaultFormats
post("/test") {
withJsonFuture[MyJsonParams]{ (addr, params) =>
println(addr)
}
}
def withJsonFuture[A](closure: (String, A) => Unit)(implicit mf: Manifest[A]) = {
contentType = "text/json"
val addr = request.getRemoteAddr()
val params:A = parse(request.body).extract[A]
future{
closure(addr, params)
}
Ok("""{"result":"OK"}""")
}
}
This seems to work. However, I really don't know if it is still includes any bad concurrency-related programming practice that could cause an error in the future ("future" meant in its most common sense = what lies ahead :).
Scalatra is not so well suited for asynchronous code. I recently stumbled on the very same problem as you.
The problem is that scalatra tries to make the code as declarative as possible by exposing a dsl that removes as much fuss as possible, and in particular does not require you to explicitly pass data around.
I'll try to explain.
In your example, the code inside post("/test") is an anonymous function. Notice that it does not take any parameter, not even the current request object.
Instead, scalatra will store the current request object inside a thread local value just before it calls your own handler, and you can then get it back through ScalatraServlet.request.
This is the classical Dynamic Scope pattern. It has the advantage that you can write many utility methods that access the current request and call them from your handlers, without explicitly passing the request.
Now, the problem comes when you use asynchronous code, as you do.
In your case, the code inside withJsonFuture executes on another thread than the original thread that the handler was initially called (it will execute on a thread from the ExecutionContext's thread pool).
Thus when accessing the thread local, you are accessing a totally distinct instance of the thread local variable.
Simply put, the classical Dynamic Scope pattern is no fit in an asynchronous context.
The solution here is to capture the request at the very start of your handler, and then exclusively reference that:
post("/test") {
val currentRequest = request
withJsonFuture[MyJsonParams]{ params =>
// code that calls request.getRemoteAddr goes here
// sometimes request is null and I get an exception
println(currentRequest)
}
}
Quite frankly, this is too easy to get wrong IMHO, so I would personally avoid using Scalatra altogether if you are in an synchronous context.
I don't know Scalatra, but it's fishy that you are accessing a value called request that you do not define yourself. My guess is that it is coming as part of extending ScalatraServlet. If that's the case, then it's probably mutable state that it being set (by Scalatra) at the start of the request and then nullified at the end. If that's happening, then your workaround is okay as would be assigning request to another val like val myRequest = request before the future block and then accessing it as myRequest inside of the future and closure.
I do not know scalatra but at first glance, the withJsonFuture function returns an OK but also creates a thread via the future { closure(addr, params) } call.
If that latter thread is run after the OK is processed, the response has been sent and the request is closed/GCed.
Why create a Future to run you closure ?
if withJsonFuture needs to return a Future (again, sorry, I do not know scalatra), you should wrap the whole body of that function in a Future.
Try to put with FutureSupport on your class declaration like this
class MyServices extends ScalatraServlet with FutureSupport {}

Actor-based webservice - How to do it properly?

In the past few months, me and my colleagues have successfully built a server-side system for dispatching push notifications to iPhone devices. Basically, a user registers for these notifications via a RESTful webservice (Spray-Server, recently updated to use Spray-can as the HTTP layer), and the logic schedules one or multiple messages for dispatch in the future, using Akka's scheduler.
This system, as we built it, simply works: it can handle hundreds, maybe even thousands of HTTP requests a second, and can send out notifications at a rate of 23,000 per second - possibly even more if we reduce log output, add multiple notification sender actors (and thus more connections with Apple), and there might be some optimization to be done in the Java library we use (java-apns).
This question is about how to do it Right(tm). My colleague, much more knowledgeable about Scala and actor-based systems in general, noted how the application isn't a 'pure' actor-based system - and he's right. What I'm wondering now is how to do it Right.
At the moment, we have a single Spray HttpService actor, not subclassed, that is initialized with a set of directives that outlines our HTTP service logic. Currently, very much simplified, we have directives like this:
post {
content(as[SomeBusinessObject]) { businessObject => request =>
// store the business object in a MongoDB back-end and wait for the ID to be
// returned; we want to send this back to the user.
val businessObjectId = persister !! new PersistSchedule(businessObject)
request.complete("/businessObject/%s".format(businessObjectId))
}
}
Now, if I get this right, 'waiting for a response' from an actor is a no-no in actor-based programming (plus the !! is deprecated). What I believe is the 'correct' way to do it is to pass the request object over to the persister actor in a message, and have it call request.complete as soon as it's received a generated ID from the back-end.
I have rewritten one of the routes in my application to do just this; in the message that is sent to the actor, the request object / reference is also sent. This seems to work like it's supposed to:
content(as[SomeBusinessObject]) { businessObject => request =>
persister ! new PersistSchedule(request, businessObject)
}
My main concern here is that we seem to pass the request object to the 'business logic', in this case the persister. The persister now gets additional responsibility, i.e. call request.complete, and knowledge about what system it runs in, i.e. that it's part of a webservice.
What would be the correct way to handle a situation like this, so that the persister actor becomes unaware of it being part of a http service, and doesn't need to know how to output the generated ID?
I'm thinking that the request should still be passed to the persister actor, but instead of the persister actor calling request.complete, it sends a message back to the HttpService actor (a SchedulePersisted(request, businessObjectId) message), which simply calls request.complete("/businessObject/%s".format(businessObjectId)). Basically:
def receive = {
case SchedulePersisted(request, businessObjectId) =>
request.complete("/businessObject/%s".format(businessObjectId))
}
val directives = post {
content(as[SomeBusinessObject]) { businessObject => request =>
persister ! new PersistSchedule(request, businessObject)
}
}
Am I on the right track with this approach?
A smaller secondary spray-server specific question, is it okay to subclass HttpService and override the receive method, or will I break things that way? (I have no clue about subclassing actors, or how to pass unrecognized messages to the 'parent' actor)
Final question, is passing the request object / reference around in actor messages that may pass throughout the entire application an okay approach, or is there a better way to 'remember' what request should be sent a response after flowing the request through the application?
In regards to your first question, yes, you are on the right track. (Although I would also like to see some alternative ways to handle this sort of issue).
One suggestion I have is to insulate the persister actor from knowing about requests at all. You can pass the request as an Any type. Your matcher in your service code can automagically cast the cookie back into a Request.
case class SchedulePersisted(businessObjectId: String, cookie: Any)
// in your actor
override def receive = super.receive orElse {
case SchedulePersisted(businessObjectId, request: Request) =>
request.complete("/businessObject/%s".format(businessObjectId))
}
In regards to your second question, actor classes are really no different than regular classes. But you do need to make sure you call the superclass's receive method, so that it can handle its own messages. I had some other ways of doing this in my original answer, but I think I prefer chaining partial functions like this:
class SpecialHttpService extends HttpService {
override def receive = super.receive orElse {
case SpecialMessage(x) =>
// handle special message
}
}
You could also use the produce directive. It allows you to decouple the actual marshalling from the request completion:
get {
produce(instanceOf[Person]) { personCompleter =>
databaseActor ! ShowPersonJob(personCompleter)
}
}
The produce directive in this example extracts a function Person => Unit that you can use to complete the request transparently deep within the business logic layer, which should not be aware of spray.
https://github.com/spray/spray/wiki/Marshalling-Unmarshalling