Organizing and analyzing logs in an asynchronous Scala web application - scala

In the old days, when each request to a web application was handled by one thread, it was fairly easy to understand the logs. One could, for example, use a servlet filter to name the thread that was handling a request with some sort of request id. This request id then could be output in the logs. In this world, a simple grep was all it took to collect the log lines for a given request.
In my current position, I'm building web applications with Scala (we're using Scalatra but that isn't specifically relevant to my question). Each request creates a scala.concurrent.Future and is then parked until that future has completed. The important bit here is that the thread that actually handles the business logic is different from the thread that handled the request which is different (I think) from the thread that completes the request and so the context of that request is lost during processing. The business logic can log all it likes but it is hard to associate that logging with the specific request it relates to.
Now from the standpoint of supporting my web services in production, the old approach was great and I'd like to come up with something similar for my asynchronous services. I've been trying to come up with a way to do it but have come up empty. That is, I haven't come up with anything nearly as light weight as the old, name-the-thread model. Does the Stack Overflow crowd have any suggestions?

As you have written, assign an id to each request, and pass that to the business logic function. You can also do this with implicit parameter, so your code won't be cluttered.

This should be possible with MDC logging available with SLF4j which uses Thread local storage to store the context of the each request.
Also you will have to create a MDC Context Propagating execution context, to move the context across threads.
This post describes it well:


How to get a kill switch per user for Akka Http Websocket connection?

I'm new to Akka and Scala and self learning this to do a small project with websockets. End goal is simple, make a basic chat server that publishes + subscribes messages on some webpage.
In fact, after perusing their docs, I already found the pages that are relevant to my goal, namely this and this.
Using dynamic junctions (aka MergeHub & BroadcastHub), and the Flow.fromSinkAndSource() method, I was able to acheive a very basic example of what I wanted. We can even get a kill switch using the example from the akka docs which I have shown below. Code is like:
private lazy val connHub: Flow[Message, Message, UniqueKillSwitch] = {
val (sink, source) = MergeHub.source[Message].toMat(BroadcastHub.sink[Message])(Keep.both).run()
Flow.fromSinkAndSourceCoupled(sink, source).joinMat(KillSwitches.singleBidi[Message, Message])(Keep.right)
However, I now see one issue. The above will return a Flow that will be used by Akka's websocket directive: akka.http.scaladsl.server.Directives.handleWebSocketMessages(/* FLOW GOES HERE */)
That means the akka code itself will materialize this flow for me so long as I provide it the handler.
But let's say I wanted to arbitrarily kill one user's connection through a KillSwitch (maybe because their session has expired on my application). While a user's websocket would be added through the above handler, since my code would not be explicitly materializing that flow, I won't get access to a KillSwitch. Therefore, I can't kill the connection, only the user can when they leave the webpage.
It's strange to me that the docs would mention the kill switch method without showing how I would get one using the websocket api.
Can anyone suggest a solution as to how I could obtain the kill switch per connection? Do I have a fundamental misunderstanding of how this should work?
Thanks in advance.
I'm very happy to say that after a lot of time, research, and coding, I have an answer for this question. In order to do this, I had to post in the Akka Gitter as well as the Lightbend discussion forum. Please refer to the amazing answer I got there for some perspective on the problem and some solutions. I'll summarize that here.
In order to get the UniqueKillSwitch from the code that I was using, I needed to use the mapMaterializeValue() method on the Flow that I was returning. Here is the code that I'm using to return a Flow to the handleWebSocketMessages directive now:
// note - state will not be updated properly if cancellation events come through from the client side as user->killswitch mapping may still remain in concurrent map even if the connections are closed
Flow.fromSinkAndSourceCoupled(mergeHubSink, broadcastHubSource)
.joinMat(KillSwitches.singleBidi[Message, Message])(Keep.right)
.mapMaterializedValue { killSwitch =>
connections.put(user, killSwitch) // add kill switch in side effect once value is ready from materialization
The above code lives in a Chatroom class I've created that has access to the mergehub and broadcast hub materialized sink and source. It also has access to a concurrent hashmap that persists the kill switch to a user. In this way, we now have access to the Kill Switch through querying it through a map. From there, you can call switch.shutdown() to kill the user's connection from the server side.
My main issue was that I originally thought I could get the switch directly even though I didn't control the materialization. This doesn't seem possible. I suggest this method for when you know that the caller that requires your Flow doesn't care about the materialized value (aka the kill switch).
Please reference the answer I've linked for more scenarios and ways to handle this problem.

Is it acceptable to model an event queue as a restful service?

I have been looking at RESTful Web Services and was wondering about modelling an event queue in REST.
Assuming the event queue is accessible at URL: http://my.domain/events, it seems to me that a POST operation applied to this URL is okay because it will add the event to the end of the list that represents the queue. Further, if I perform a GET operation on this URL, it seems to me that returning the head of queue also is okay.
My question is - is it okay for the GET operation to also remove the head of the queue or should this be performed by a separate DELETE operation?
is it okay for the GET operation to also remove the head of the queue
No, it is not from REST perspective. GET request should be safe according to REST best practices. Making any number of GET requests to a URL should have the same effect as making no requests at all.
There's one more concern about your design. There are usually two common patterns to retrieve a queue head:
The first one is to just get a head, process it and then notify the queue to remove the message if it was processed successfully, if not, the message gets back to the queue to be processed later again. It's a more robust approach.
The second one is to just get a queue head and remove it at the same time just like you described in your question.
To support both patterns I think you should only retrieve a message when doing GET and implement DELETE method so it returns a deleted message object as a response. This way you will comply with REST uniform interface and your queue client will be able to implement both patters.
Hope it helps!
Does your integrity requirements allow GET + DELETE in one step?
Events normally should not get lost. What happens if the response retrieval fails after the delete was executed?
I would GET the head of the queue and then send an acknowledgement containing the event ID that was received and successfully processed. Thus, you guarantee an at-least-once-delivery.
Depending on the number of events you are processing, a message bus might be the more suitable option here.
Do not become an overzealous REST paradigm worshipper. REST is a protocol but it does not necessarily need to convey the contract of the service.
What you say is perfectly fine as long as the contract between the consumer and the queue are clear and documented.

What should be returned from the API for CQRS commands?

As far as I understand, in a CQRS-oriented API exposed through a RESTful HTTP API the commands and queries are expressed through the HTTP verbs, the commands being asynchronous and usually returning 202 Accepted, while the queries get the information you need. Someone asked me the following: supposing they want to change some information, they would have to send a command and then a query to get the resulting state, why to force the client to make two HTTP requests when you can simply return what they want in the HTTP response of the command in a single HTTP request?
We had a long conversation in DDD/CRQS mailing list a couple of months ago (link). One part of the discussion was "one way command" and this is what I think you are assuming. You can find out that Greg Young is opposed to this pattern. A command changes the state and therefore prone to failure, meaning it can fail and you should support this. REST API with POST/PUT requests provide perfect support for this but you should not just return 202 Accepted but really give some meaningful result back. Some people return 200 success and also some object that contains a URL to retrieve the newly created or updated object. If the command handler fails, it should return 500 and an error message.
Having fire-and-forget commands is dangerous since it can give a consumer wrong ideas about the system state.
My team also recently had a very heated discussion about this very thing. Thanks for posting the question. I have usually been the defender of the "fire and forget" style commands. My position has always been that, if you want to be able to move to an async command dispatcher some day, then you cannot allow commands to return anything. Doing so would kill your chances since an async command doesn't have much of a way to return a value to the original http call. Some of my team mates really challenged this thinking so I had to start thinking if my position was really worth defending.
Then I realized that async or not async is JUST an implementation detail. This led me to realize that, using our frameworks, we can build in middleware to accomplish the same thing our async dispatchers are doing. So, we can build our command handlers the way we want to, returning what ever makes sense, and then let the framework around the handlers deal with the "when".
Example: My team is building an http API in node.js currently. Instead of requiring a POST command to only return a blank 202, we are returning details of the newly created resource. This helps the front-end move on. The front-end POSTS a widget and opens a channel to the server's web socket using the same command as the channel name. the request comes to the server and is intercepted by middleware which passes it to the service bus. When the command is eventually processed synchronously by the handler, it "returns" via the web socket and the front-end is happy. The middleware can be disabled easily, making the API synchronous again.
There is nothing stopping you from doing that. If you execute your commands synchronously and create your projections synchronously, then it will be easy for you to just make a query directly after executing the command and returning that result. If you do this asynchronously via the rest-api, then you have no query result to send back. If you do it asynchronously within your system, then you can wait for the projection to be created and then send the response to the client.
The important thing is that you separate your write and read models in classic CQRS style. That does not mean that you cannot do a read in the same request as you do the command. Sure, you can send a command to the server and then with SignalR (or something) wait for a notification that your projection have been created/updated. I do not see a problem with waiting for the projection to be created on the server side instead for on the client.
How you do this will affect you infrastructure and error handling. Also, you will hold the HTTP request open for a longer time if you return the result at once.

Scheduling/delaying of jobs/tasks in Play framework 2.x app

In a typical web application, there are some things that I would prefer to run as delayed jobs/tasks. They tend to have some or all of the following properties:
Takes a long time (anywhere from multiple seconds to multiple minutes to multiple hours).
Occupy some resource heavily (CPU, network, disk, external API limits, etc.)
Result not immediately necessary. Can complete HTTP response without it. OK (and possibly even preferable) to delay until later.
Can be (and possibly preferable to) run on (a) different machine(s) than web server(s). The machine(s) are potentially dedicated job/task runners.
Should be run in response to other event(s), or started periodically.
What would be the preferred way(s) to set up, enqueue, schedule, and run delayed jobs/tasks in a Scala + Play Framework 2.x app?
For more details...
The pattern I have used in the past, and which I would like to replicate if applicable, is:
In handler of web request, or in cron-like call, enqueue job(s)
In job runner(s), repeatedly dequeue and run one job at a time
Possibly handle recording job results
This seems to be a relatively simple yet still relatively flexible pattern.
Examples I have encountered in the past include:
Updating derived data in DB
Analytics/tracking API calls for a web request
Delete expired sessions or other stale/outdated DB records
Periodic batch ETLs
In other languages/frameworks, I would typically use a job/task framework. Examples include:
Resque in a Ruby + Rails app
Celery in a Python + Django app
I have found the following existing materials, but unfortunately, I don't think they fit my use case directly.
Play 1.x asynchronous jobs API (+ various SO questions referencing it). Appears to have been removed in 2.x line. No reference to what replaced it.
Play 2.x Akka integration. Seems very general-purpose. I'd imagine it's possible to use Akka for the above, but I'd prefer not to write a jobs/tasks framework if one already exists. Also, no info on how to separate the job runner machine(s) from your web server(s).
This SO answer. Seems potentially promising for the "short to medium duration IO bound" case, e.g. analytics calls, but not necessarily for the "CPU bound" case (probably shouldn't tie up CPU on web server, prefer to ship off to different node), the "lots of network" case, or the "multiple hour" case (probably shouldn't leave that in the background on the web server, even if it isn't eating up too many resources).
This SO question, and related questions. Similar to above, it seems to me that this covers only the cases where it would be appropriate to run on the same web server.
Some further clarification on use-cases (as per commenters' request). There are two main use-cases that I have experienced with something like resque or celery that I am trying to replicate here:
Some event on the site (Most often, an incoming web request causes task to be enqueued.)
Task should run periodically. (Most often, this is implemented as: periodically, enqueue task to be run as above.)
In the case of resque or celery, the tasks enqueued by both use-cases enter queues the same way and are treated the same way by the runner/worker process. Barring other Scala or Play-specific considerations, that would be my initial guess for how to approach this.
Some further clarification on why I do not believe the Akka scheduler fits my use case out-of-the-box (as per commenters' request):
While it is no doubt possible to construct a fitting solution using some combination of the Akka scheduler (for periodic jobs), akka-remote and akka-cluster (for communicating between the job caller and the job runner), that approach requires a certain amount of glue code which is almost a delayed job framework in and of itself. If it exists, I would prefer to use an existing out-of-the-box solution rather than reinvent the wheel.

How to use a WF DelayActivity in an ASP.Net web based workflow

I have a web application that I am adding workflow functionality to using Windows Workflow Foundation. I have based my solution around K. Scott Allen's Orders Workflow example on OdeToCode. At the start I didn't realise the significance of the caveat "if you use Delay activities with and configure active timers for the manual scheduling service, these events will happen on a background thread that is not associated with an HTTP request". I now need to use Delay activities and it doesn't work as is with his solution architecture. Has anyone come across this and found a good solution to this? The example is linked to from a lot of places but I haven't seen anyone else come across this issue and it seems like a bit of a show stopper to me.
Edit: The problem is that the results from the workflow are returned to the the web application via HttpContext. I am using the ManualWorkflowSchedulerService with the useActiveTimers and this works fine for most situations because workflow events are fired from the web app and HttpContext still exists when the workflow results are returned and the web app can continue processing. When a delay activity is used processing happens on a background thread and when it tries to return results to the web app, there is no valid HttpContext (because there has been no Http Request), so further processing fails. That is, the webapp is trying to process the workflow results but there has been no http request.
I think I need to do all post Delay activity processing within the workflow rather than handing off to the web app.
You didn't describe the problem you are having. But maybe this is of some help.
You can use the ManualWorkflowSchedulerService with the useActiveTimers and the workflow will continue on another thread. Normally this is fine because your HTTP request has already finished and it doesn't really matter.
If however you need full control the workflow runtime will let you get a handle on all loaded workflows using the GetLoadedWorkflows() function. This will return acollection of WorkflowInstance objects. usign these you can can call the GetWorkflowNextTimerExpiration() to check which is expired. If one is you can manually resume it. In this case you want to use the ManualWorkflowSchedulerService with the useActiveTimers=false so you can control the last thread as well. However in most cases using useActiveTimers=true works perfectly well.