Use Ack on chunked responses with Spray - scala

I'm using spray-can 1.2.1.
I'm streaming big files from/to a storage, I use both chunked requests and chunked responses for that.
For chunk requests I use the built-in ack mechanism in my actor to make sure each chunk has been written before sending more:
connection ! MessageChunk(data).withAck(ChunkSent)
connection is the IO actor provided by Spray and Akka, then I can wait for a ChunkSent before sending the next chunk. Good.
I'm struggling to reproduce the same behavior with chunked responses. I can send my HttpRequest and then receive a ChunkedResponseStart, followed by a bunch of MessageChunks and finally a ChunkedMessageEnd but is there a way to force Spray to wait for me to send an ack after each MessageChunk before sending the next one?
Edit: Just to be a bit more clear: I use spray-can as a client in this case, I am not the server, the server is the storage I mentioned before.

Well put question. Currently, you cannot make spray (1.x.1) wait for Acks before continuing to read.
What you can do however is to send Tcp.SuspendReading and Tcp.ResumeReading commands to the client connection (sender of chunks) to instruct the Akka IO TCP layer to stop reading while you are overloaded. See this proof-of-concept which tries to add Acking for the receive side (for the server but should work similarly for the client side) on top of SuspendReading/ResumeReading for hints about how to build something with the current version of spray.
This situation is clearly not optimal because especially under load 1) you need to figure out that you are overloaded and 2) those message may be stuck in the message queue of the TCP connection before they will be handled.
There are two things that will improve the situation in the future:
Since recently Akka IO supports a "pull mode" where the TCP connection will only ever read once and then wait for a Tcp.ResumeReading command (basically an Ack). However, this is not available for use in spray (and probably won't be).
What we are focusing on right now for Akka HTTP is streaming support. The plan is to introduce a new API (next to what we have in spray-can) that makes working with streams natural for HTTP and which will support automatic back-pressure support without any need for extra user code to support it. Alas, it is not yet ready.

Related

Websocket vs REST when sending data to server

Background
We are writing a Messenger-like app. We have setup Websockets to Inbox and Chat.
Question
My question is simple. What are the advantages and disadvantages when sending data from Client to Server using REST instead of Websockets? (I am not interested in updates now.)
We know that REST has higher overhead in terms of message sizes and that WS is duplex (thus open all time). What about the other things we didn't keep in mind?
Here's a summary of the tradeoffs I'm aware of.
Reasons to use webSocket:
You need/want server-push of data.
You are sending lots of small pieces of data from client to server and doing it very regularly. Using webSocket has significantly less overhead per transmission.
Reasons to use REST:
You want to use server-side frameworks or modules that are built for REST, not for webSocket (such as auth, rate limiting, security, streaming, etc...).
You aren't sending data very often from client to server and thus the server-side burden of keeping a webSocket connection open all the time may lessen your server scalability.
You want your client to run in places where a long-connected webSocket during inactive periods of time may not be practical (perhaps mobile).
You want your client to run in old browsers that don't support webSocket.
You want the browser to enforce same-origin restrictions (those are enforced for REST Ajax calls, but not for webSocket connections).
You don't want to have to write code that detects when the webSocket connection has died and then auto-reconnects and handles back-offs and handles mobile issues with battery usage issues, etc...
You need to run in situations where there are proxies or other network infrastructure that may not support long running webSocket connections.
If you want request/response built in. REST is request/response. WebSocket is not - it's message based. Responses from a webSocket are done by sending a messge back. That message back is not, by itself, a response to any specific request, it's just data being sent back. If you want request/response with webSocket, then you have to build some infrastructure yourself where you tag an id into a request and the response for that particular request contains that specific id. Otherwise, if there are every multiple requests in flight at the same time, then you don't know which response belongs with which request because all the data is being sent over the same connection and you would have no way of matching response with request.
If you want other clients to be able to carry out this operation via an Ajax call.
So, if you already have a webSocket implementation, don't have any problem with it that are lessened with REST and aren't interested in any of the reasons that REST might be better, then stick with your webSocket implementation.
Related references:
websocket vs rest API for real time data?
Ajax vs Socket.io
Adding comments per your request:
It sounds like you're expecting someone to tell you the "right" way to do it. There are reasons to pick one way over the other. If none of those reason compel you one way vs. the other, then it's just an architectural choice and you must take in the whole context of what you are doing and decide which architectural choice makes more sense to you. If you already have the reliably established webSocket connection and none of the advantages of REST apply to your situation then you can optimize for "efficiency" and send your data to the server over the webSocket connection.
On the other hand, if you wanted there to be a simple API on your server that could be reached with an Ajax call from other clients, then you'd want your server to support this operation via REST so it would simplest for these other clients to carry out this one operation. So, it all depends upon which direction your requirements drive you and, if there is no particular driving reason to go one way or the other, you just make an architectural choice yourself.

What happens to messages that come to a server implements stream processing after the source reached its bound?

Im learning akka streams but obviously its relevant to any streaming framework :)
quoting akka documentation:
Reactive Streams is just to define a common mechanism of how to move
data across an asynchronous boundary without losses, buffering or
resource exhaustion
Now, from what I understand is that if up until before streams, lets take an http server for example, the request would come and when the receiver wasent finished with a request, so the new requests that are coming will be collected in a buffer that will hold the waiting requests, and then there is a problem that this buffer have an unknown size and at some point if the server is overloaded we can loose requests that were waiting.
So then stream processing came to play and they bounded this buffer to be controllable...so we can predefine the number of messages (requests in my example) we want to have in line and we can take care of each at a time.
my question, if we implement that a source in our server can have a 3 messages at most, so if the 4th id coming what happens with it?
I mean when another server will call us and we are already taking care of 3 requests...what will happened to he's request?
What you're describing is not actually the main problem that Reactive Streams implementations solve.
Backpressure in terms of the number of requests is solved with regular networking tools. For example, in Java you can configure a thread pool of a networking library (for example Netty) to some parallelism level, and the library will take care of accepting as much requests as possible. Or, if you use synchronous sockets API, it is even simpler - you can postpone calling accept() on the server socket until all of the currently connected clients are served. In either case, there is no "buffer" on either side, it's just until the server accepts a connection, the client will be blocked (either inside a system call for blocking APIs, or in an event loop for async APIs).
What Reactive Streams implementations solve is how to handle backpressure inside a higher-level data pipeline. Reactive streams implementations (e.g. akka-streams) provide a way to construct a pipeline of data in which, when the consumer of the data is slow, the producer will slow down automatically as well, and this would work across any kind of underlying transport, be it HTTP, WebSockets, raw TCP connections or even in-process messaging.
For example, consider a simple WebSocket connection, where the client sends a continuous stream of information (e.g. data from some sensor), and the server writes this data to some database. Now suppose that the database on the server side becomes slow for some reason (networking problems, disk overload, whatever). The server now can't keep up with the data the client sends, that is, it cannot save it to the database in time before the new piece of data arrives. If you're using a reactive streams implementation throughout this pipeline, the server will signal to the client automatically that it cannot process more data, and the client will automatically tweak its rate of producing in order not to overload the server.
Naturally, this can be done without any Reactive Streams implementation, e.g. by manually controlling acknowledgements. However, like with many other libraries, Reactive Streams implementations solve this problem for you. They also provide an easy way to define such pipelines, and usually they have interfaces for various external systems like databases. In particular, such libraries may implement backpressure on the lowest level, down to to the TCP connection, which may be hard to do manually.
As for Reactive Streams itself, it is just a description of an API which can be implemented by a library, which defines common terms and behavior and allows such libraries to be interchangeable or to interact easily, e.g. you can connect an akka-streams pipeline to a Monix pipeline using the interfaces from the specification, and the combined pipeline will work seamlessly and supporting all of the backpressure features of Reacive Streams.

Moving from socko to akka-http websockets

I have an existing akka application built on socko websockets. Communication with the sockets takes place inside a single actor and messages both leaving and entering the actor (incoming and outgoing messages, respectively) are labelled with the socket id, which is a first class property of a socko websocket (in socko a connection request arrives labelled with the id, and all the lifecycle transitions such as handshaking, disconnection, incoming frames etc. are similarly labelled)
I'd like to reimplement this single actor using akka-http (socko is more-or-less abandonware these days, for obvious reasons) but it's not straightforward because the two libraries are conceptually very different; akka-http hides the lower level details of the handshaking, disconnection etc, simply sending whichever actor was bound to the http server an UpgradeToWebsocket request header. The header object contains a method that takes a materialized Flow as a handler for all messages exchanged with the client.
So far, so good; I am able to receive messages on the web socket and reply them directly. The official examples all assume some kind of stateless request-reply model, so I'm struggling with understanding how to make the next step to assigning a label to the materialized flow, managing its lifecycle and connection state (I need to inform other actors in the application when a connection is dropped by a client, as well as label the messages.)
The alternative (remodelling the whole application using akka-streams) is far too big a job, so any advice about how to keep track of the sockets would be much appreciated.
To interface with an existing actor-based system, you should look at Source.actorRef and Sink.actorRef. Source.actorRef creates an ActorRef that you can send messages to, and Sink.actorRef allows you to process the incoming messages using an actor and also to detect closing of the websocket.
To connect the actor created by Source.actorRef to the existing long-lived actor, use Flow#mapMaterializedValue. This would also be a good place to assign an unique id for a socket connection.
This answer to a related question might get you started.
One thing to be aware of. The current websocket implementation does not close the server to client flow when the client to server flow is closed using a websocket close message. There is an issue open to implement this, but until it is implemented you have to do this yourself. For example by having something like this in your protocol stack.
The answer from RĂ¼diger Klaehn was a useful starting point, thanks!
In the end I went with ActorPublisher after reading another question here (Pushing messages via web sockets with akka http).
The key thing is that the Flow is 'materialized' somewhere under the hood of akka-http, so you need to pass into UpgradeToWebSocket.handleMessagesWithSinkSource a Source/Sink pair that already know about an existing actor. So I create an actor (which implements ActorPublisher[TextMessage.Strict]) and then wrap it in Source.fromPublisher(ActorPublisher(myActor)).
When you want to inject a message into the stream from the actor's receive method you first check if totalDemand > 0 (i.e. the stream is willing to accept input) and if so, call onNext with the contents of the message.

How implement real-time bidirectional HTTP communication on top of Netty 4 using AHC

I'm writing a client using AsyncHttpClient (AHC) v2.0beta (using Netty 4 as a provider) that streams audio in real-time and it needs to receive server data in real-time too (while streaming). Imagine a HTTP client streaming the microphone's output as the user speaks and receiving the audio transcription has it happens in real time. In short, it's a bidirectional real-time communication over HTTP (chunked multipart request/response).
In order to do that, I had to hack AHC a bit. For instance, there is a blocking call to wait for input data in org.asynchttpclient.multipart.MultipartBody#read(ByteBuffer buffer) which is implemented on top of Netty's io.netty.handler.stream.ChunkedInput.
This somewhat works. The problem is that my custom AsyncHandler will not get onBodyPartReceived() callbacks until the request has finished streaming. They receiving events get pilled up, probably because Netty isn't reading while there is still content to write. Playing with the network stack, I noticed I was only able to receive server responses while streaming if the client was having network contention while writing.
Can someone tell me if this behavior is the result of my particular implementation (blocking in MultipartBody#read()) or an architectural design constrain imposed by Netty's internal implementation?
As a side note, reading and writing happens inside a single IO thread nioEventLoopGroup-X.

What are the pitfalls of using Websockets in place of RESTful HTTP?

I am currently working on a project that requires the client requesting a big job and sending it to the server. Then the server divides up the job and responds with an array of urls for the client to make a GET call on and stream back the data. I am the greenhorn on the project and I am currently using Spring websockets to improve efficiency. Instead of the clients constantly pinging the server to see if it has results ready to stream back, the websocket will now just directly contact the client hooray!
Would it be a bad idea to have websockets manage the whole process from end to end? I am using STOMP with Spring websockets, will there still be major issues with ditching REST?
With RESTful HTTP you have a stateless request/response system where the client sends request and server returns the response.
With webSockets you have a stateful (or potentially stateful) message passing system where messages can be sent either way and sending a message has a lower overhead than with a RESTful HTTP request/response.
The two are fairly different structures with different strengths.
The primary advantages of a connected webSocket are:
Two way communication. So, the server can notify the client of anything at any time. So, instead of polling a server on some regular interval to see if there is something new, a client can establish a webSocket and just listen for any messages coming from the server. From the server's point of view, when an event of interest for a client occurs, the server simply sends a message to the client. The server cannot do this with plain HTTP.
Lower overhead per message. If you anticipate a lot of traffic flowing between client and server, then there's a lower overhead per message with a webSocket. This is because the TCP connection is already established and you just have to send a message on an already open socket. With an HTTP REST request, you have to first establish a TCP connection which is several back and forths between client and server. Then, you send HTTP request, receive the response and close the TCP connection. The HTTP request will necessarily include some overhead such as all cookies that are aligned with that server even if those are not relevant to the particular request. HTTP/2 (newest HTTP spec) allows for some additional efficiency in this regard if it is being used by both client and server because a single TCP connection can be used for more than just a single request/response. If you charted all the requests/responses going on at the TCP level just to make an https REST request/response, you'd be surpised how much is going on compared to just sending a message over an already established webSocket.
Higher Scale in some circumstances. With lower overhead per message and no client polling to find out if something is new, this can lead to added scalability (higher number of clients a given server can serve). There are downsides to the webSocket scalability too (see below).
Stateful connections. Without resorting to cookies and session IDs, you can directly store state in your program for a given connection. While a lot of development has been done with stateless connections to solve most problems, sometimes it's just simpler with stateful connections.
The primary advantages of a RESTful HTTP request/response are:
Universal support. It's hard to get more universally supported than HTTP. While webSockets enjoy relatively good support now, there are still some circumstances where webSocket support isn't regularly available.
Compatible with more server environments. There are server environments that don't allow long running server processes (some shared hosting situations). These environments can support HTTP request, but can't support long running webSocket connections.
Higher Scale in some circumstances. The webSocket requirement for a continuously connected TCP socket adds some new scale requirements to the server infrastructure that HTTP requests don't demand. So, this ends up being a tradeoff space. If the advantages of webSockets aren't really needed or being used in a significant way, then HTTP requests might actually scale better. It definitely depends upon the specific usage profile.
For a one-off request/response, a single HTTP request is more efficient than establishing a webSocket, using it and then closing it. This is because opening a webSocket starts with an HTTP request/response and then after both sides have agreed to upgrade to a webSocket connection, the actual webSocket message can be sent.
Stateless. If your job is not made more complicated by having a stateless infrastruture, then a stateless world can make scaling or fail-over much easier (just add or remove server processes behind a load balancer).
Automatically Cacheable. With the right server settings, http responses can be cached by browser or by proxies. There is no such built-in mechanism for requests sent via webSockets.
So, to address the way you asked the question:
What are the pitfalls of using websockets in place of RESTful HTTP?
At large scale (hundreds of thousands of clients), you may have to do some special server work in order to support large numbers of simultaneously connected webSockets.
All possible clients or toolsets don't support webSockets or requests made over them to the same level they support HTTP requests.
Some of the less expensive server environments don't support the long running server processes required to support webSockets.
If it's important to your application to get progress notifications back to the client, you could either use a long running http connection with continuing progress being sent down or you can use a webSocket. The webSocket is likely easier. If you really only need the webSocket for the relatively short duration of this particular activity, then you may find the best overall set of tradeoffs comes by using a webSocket only for the duration of time when you need the ability to push data to the client and then using http requests for the normal request/response activities.
It really depends on your requirements. REST services can be much more transparent and easier to pick up by developer compared to Websockets.
Using Websockets, you remove most of the advantages that RESTful webservices offer, such as the ability to reference a resource via a URI. Really what you should be doing is to figure out what the advantages are of REST and hypermedia, and based on that decide whether those advantages are important to you.
It's of course entirely possible to create a RESTful webservice, and augment it with a a websocket-based API for real-time responses.
But if you are creating a service that only you are going to consume in a controlled environment, the only disadvantage might be that not every client supports websockets, while pretty much any type of environment can do a simple http call.