Failures in streaming handling of requests - what happens to connection?

Failures in streaming handling of requests - what happens to connection? - scala

The documentation for akka-http explains that it is important to consume a request stream entirely since bytes that are not pulled will be interpreted as backpressure (https://doc.akka.io/docs/akka-http/current/implications-of-streaming-http-entity.html). When you know beforehand that the stream can be ignored you should use discardEntityBytes, or otherwise read it fully. There is also the option of closing the connection by attaching the stream to a Sink.cancelled.
My question is what happens when the stream fails.
Is the stream drained or is the connection closed? Or is it the responsibility of the implementation to recover from errors and either drain or close the connection? If so, what is a good code pattern for this?
Does it matter if a request is completed with a Future or if the response is streaming?
What if, instead of an unexpected failure, you determine half-way through the stream that the rest of the stream can be ignored. Is throwing an exception a good way of stopping stream processing?
Example completing with a future:
val route =
post {
extractDataBytes { data =>
complete {
data
.via(flow1)
.via(flow2) // say error happens here at some point
.runwWith(sink)
}
}
}

If the server connection is having problem then connection will be automatically closed.

Related

How does the Camel Netty TCP socket consumer decide how to split incoming data into messages (and is it configurable)?

I'm working with a Camel flow that uses a Netty TCP socket consumer to receive messages from a client program (which is outside of my control). The client should be opening a socket, sending us one message, then closing the socket, but we've been seeing cases where instead of one message Camel is "splitting" the text stream into two parts and trying to process them separately.
So I'm trying to figure out, since you can re-use the same socket for multiple Camel messages, but TCP sockets don't have a built-in concept of "frames" or a standard for message delimiters, how does Camel decide that a complete message has been received and is ready to process? I haven't been able to find a documented answer to this in the Netty component docs (https://camel.apache.org/components/3.15.x/netty-component.html), although maybe I'm missing something.
From playing around with a test script, it seems like one answer is "Camel assumes a message is complete and should be processed if it goes more than 1ms without receiving any input on the socket". Is this a correct statement, and if so is this behavior documented anywhere? Is there any way to change or configure this behavior? Really what I would prefer is for Camel to wait for an ETX character (or a much longer timeout) before processing a message, is that possible to set up?
Here's my test setup:
Camel flow:
from("netty:tcp://localhost:3003")
.log("Received: ${body}");
Python snippet:
DELAY_MS = 3
def send_msg(sock, msg):
print("Sending message: <{}>".format(msg))
if not sock.sendall(msg.encode()) is None:
print("Message failed to send")
time.sleep(DELAY_MS / 1000.0)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
print("Using DELAY_MS: {}".format(str(DELAY_MS)))
s.connect((args.hostname, args.port))
cutoff = int(math.floor(len(args.msg) / 2))
msg1 = args.msg[:cutoff]
send_msg(s, msg1)
msg2 = args.msg[cutoff:]
send_msg(s, msg2)
response = s.recv(1024)
except Exception as e:
print(e)
finally:
s.close()
I can see that with DELAY_MS=1 Camel logs one single message:
2022-02-21 16:54:40.689 INFO 19429 --- [erExecutorGroup] route1 : Received: a long string sent over the socket
But with DELAY_MS=2 it logs two separate messages:
2022-02-21 16:56:12.899 INFO 19429 --- [erExecutorGroup] route1 : Received: a long string sen
2022-02-21 16:56:12.899 INFO 19429 --- [erExecutorGroup] route1 : Received: t over the socket

After doing some more research, it seems like what I need to do is add a delimiter-based FrameDecoder to the decoders list.
Setting it up like this:
from("netty:tcp://localhost:3003?sync=true"
+ "&decoders=#frameDecoder,#stringDecoder"
+ "&encoders=#stringEncoder")
where frameDecoder is provided by
#Bean
ChannelHandlerFactory frameDecoder() {
ByteBuf[] ETX_DELIM = new ByteBuf[] { Unpooled.wrappedBuffer(new byte[] { (char)3 }) };
return ChannelHandlerFactories.newDelimiterBasedFrameDecoder(1024, ETX_DELIM,
false, "tcp");
}
seems to do the trick.
On the flip side though, it seems like this will hang indefinitely (or until lower-level TCP timeouts kick in?) if an ETX frame is not received, and I can't figure out any way to set a timeout on the decoder, so would still be eager for input if anyone knows how to do that.
I think the default "timeout" behavior I was seeing might've just been an artifact of Netty's read loop speed -- How does netty determine when a read is complete?

How to get WebSocket close code from Akka HTTP?

We are using Akka HTTP to handle our web socket connections using the akka streams API. We are using a Flow that pipes the incoming messages to a "connection actor". A snippet of the code is below:
val connection = system.actorOf(ConnectionActor.props())
val in = Flow[Message]
.to(Sink.actorRef[Message](connection, WebSocketClosed))
val out = Source
.actorRef[Message](500, OverflowStrategy.fail)
.mapMaterializedValue(ws => connection ! WebSocketOpened(ws))
Flow.fromSinkAndSource(in, out)
When the web socket is closed, the connection actor is sent the "WebSocketClose" message and we clean up internal resources. We now have the requirement to know what the reason for closing the connection was according to the standard WebSocket CloseEvent codes.
Is there a way to get the close code from Akka HTTP and send it on to the connection actor so it can take the appropriate action?

I was able to handle client (browser) error code in an akka-http 10.2.6 server.
My use case was to pipe incoming messages to a Sink created by ActorSink.actorRef[T](). When creating the sink, 2 callbacks onCompleteMessage onFailureMessage can be set to converts normal WebSocket close (code=1000) or error to our custom message types.
I suppose that client close/error maps to Flow complete/failure, that means other sinks should be able to handle close/error in a similar way.
my code
`

As it turns out, this is not presently possible in Akka HTTP. See the following GitHub issue:
https://github.com/akka/akka-http/issues/2458
It looks as though this will need to be addressed before this is possible.

In UWP StreamSocket, can I read data with timeout and leave the connection open if timeout elapses

As I couldn't find any way to peek for data (read data without consuming the buffer) as asked at How to peek StreamSocket for data in UWP apps I'm now trying to make my own "peek" but still no luck.
I don't see how I can read data from StreamSocket in the manner which will let me use timeouts and leave the connection usable in case if timeout elapses.
In the end, the problem is as follows. In my, let's say, IMAP client, I get response from a server and if this response is negative, I need to wait a bit to see if the server immediately sends yet another response (sometimes, the server can do it, with extra details on the error or even a zero packet to close the connection). if the server didn't send another response, I'm fine, just leaving the method and returning to the caller. The caller can then send more data to the stream, receive more responses, etc.
So, after sending a request and getting initial response I need in some cases to read socket once again with a very small timeout interval and if no data arrives, just do nothing.

You can use a CancelationTokenSource to generate a timeout and stop an async operation.
The DataReader consumes the data from the input stream of the StreamSocket. Its LoadAsync() method will return when there is at least one byte of data. Here, we are adding a cancellation source that will cancel the asynchronous task after 1 second to stop the DataReader.LoadAsync() if no data has been consumed.
var stream = new StreamSocket();
var inputStream = stream.InputStream;
var reader = new DataReader(inputStream);
reader.InputStreamOptions = InputStreamOptions.Partial;
while(true)
{
try
{
var timeoutSource = new CancellationTokenSource(TimeSpan.FromSeconds(1));
var data = await reader.LoadAsync(1).AsTask(timeoutSource.Token);
while(reader.UnconsumedBufferLength > 0)
{
var read = reader.ReadUInt32();
}
}
catch(TaskCanceledException)
{
// timeout
}
}
Do no forget that disposing the DataReader will close the stream and the connection.

Streaming data in and out simultaneously on a single HTTP connection in play

streaming data out of play, is quite easy.
here's a quick example of how I intend to do it (please let me know if i'm doing it wrong):
def getRandomStream = Action { implicit req =>
import scala.util.Random
import scala.concurrent.{blocking, ExecutionContext}
import ExecutionContext.Implicits.global
def getSomeRandomFutures: List[Future[String]] = {
for {
i <- (1 to 10).toList
r = Random.nextInt(30000)
} yield Future {
blocking {
Thread.sleep(r)
}
s"after $r ms. index: $i.\n"
}
}
val enumerator = Concurrent.unicast[Array[Byte]] {
(channel: Concurrent.Channel[Array[Byte]]) => {
getSomeRandomFutures.foreach {
_.onComplete {
case Success(x: String) => channel.push(x.getBytes("utf-8"))
case Failure(t) => channel.push(t.getMessage)
}
}
//following future will close the connection
Future {
blocking {
Thread.sleep(30000)
}
}.onComplete {
case Success(_) => channel.eofAndEnd()
case Failure(t) => channel.end(t)
}
}
}
new Status(200).chunked(enumerator).as("text/plain;charset=UTF-8")
}
now, if you get served by this action, you'll get something like:
after 1757 ms. index: 10.
after 3772 ms. index: 3.
after 4282 ms. index: 6.
after 4788 ms. index: 8.
after 10842 ms. index: 7.
after 12225 ms. index: 4.
after 14085 ms. index: 9.
after 17110 ms. index: 1.
after 21213 ms. index: 2.
after 21516 ms. index: 5.
where every line is received after the random time has passed.
now, imagine I want to preserve this simple example when streaming data from the server to the client, but I also want to support full streaming of data from the client to the server.
So, lets say i'm implementing a new BodyParser that parses the input into a List[Future[String]]. this means, that now, my Action could look like something like this:
def getParsedStream = Action(myBodyParser) { implicit req =>
val xs: List[Future[String]] = req.body
val enumerator = Concurrent.unicast[Array[Byte]] {
(channel: Concurrent.Channel[Array[Byte]]) => {
xs.foreach {
_.onComplete {
case Success(x: String) => channel.push(x.getBytes("utf-8"))
case Failure(t) => channel.push(t.getMessage)
}
}
//again, following future will close the connection
Future.sequence(xs).onComplete {
case Success(_) => channel.eofAndEnd()
case Failure(t) => channel.end(t)
}
}
}
new Status(200).chunked(enumerator).as("text/plain;charset=UTF-8")
}
but this is still not what I wanted to achieve. in this case, I’ll get the body from the request only after the request was finished, and all the data was uploaded to the server. but I want to start serving request as I go. a simple demonstration, would be to echo any received line back to the user, while keeping the connection alive.
so here's my current thoughts:
what if my BodyParser would return an Enumerator[String] instead of List[Future[String]]?
in this case, I could simply do the following:
def getParsedStream = Action(myBodyParser) { implicit req =>
new Status(200).chunked(req.body).as("text/plain;charset=UTF-8")
}
so now, i'm facing the problem of how to implement such a BodyParser.
being more precise as to what exactly I need, well:
I need to receive chunks of data to parse as a string, where every string ends in a newline \n (may contain multiple lines though...). every "chunk of lines" would be processed by some (irrelevant to this question) computation, which would yield a String, or better, a Future[String], since this computation may take some time. the resulted strings of this computation, should be sent to the user as they are ready, much like the random example above. and this should happen simultaneously while more data is being sent.
I have looked into several resources trying to achieve it, but was unsuccessful so far.
e.g. scalaQuery play iteratees -> it seems like this guy is doing something similar to what I want to do, but I couldn't translate it into a usable example. (and the differences from play2.0 to play2.2 API doesn't help...)
So, to sum it up: Is this the right approach (considering I don't want to use WebSockets)? and if so, how do I implement such a BodyParser?
EDIT:
I have just stumble upon a note on the play documentation regarding this issue, saying:
Note: It is also possible to achieve the same kind of live
communication the other way around by using an infinite HTTP request
handled by a custom BodyParser that receives chunks of input data, but
that is far more complicated.
so, i'm not giving up, now that I know for sure this is achievable.

What you want to do isn't quite possible in Play.
The problem is that Play can't start sending a response until it has completely received the request. So you can either receive the request in its entirety and then send a response, as you have been doing, or you can process requests as you receive them (in a custom BodyParser), but you still can't reply until you've received the request in its entirety (which is what the note in the documentation was alluding to - although you can send a response in a different connection).
To see why, note that an Action is fundamentally a (RequestHeader) => Iteratee[Array[Byte], SimpleResult]. At any time, an Iteratee is in one of three states - Done, Cont, or Error. It can only accept more data if it's in the Cont state, but it can only return a value when it's in the Done state. Since that return value is a SimpleResult (i.e, our response), this means there's a hard cut off from receiving data to sending data.
According to this answer, the HTTP standard does allow a response before the request is complete, but most browsers don't honor the spec, and in any case Play doesn't support it, as explained above.
The simplest way to implement full-duplex communication in Play is with WebSockets, but we've ruled that out. If server resource usage is the main reason for the change, you could try parsing your data with play.api.mvc.BodyParsers.parse.temporaryFile, which will save the data to a temporary file, or play.api.mvc.BodyParsers.parse.rawBuffer, which will overflow to a temporary file if the request is too large.
Otherwise, I can't see a sane way to do this using Play, so you may want to look at using another web server.

"Streaming data in and out simultaneously on a single HTTP connection in play"
I haven't finished reading all of your question, nor the code, but what you're asking to do isn't available in HTTP. That has nothing to do with Play.
When you make a web request, you open a socket to a web server and send "GET /file.html HTTP/1.1\n[optional headers]\n[more headers]\n\n"
You get a response after (and only after) you have completed your request (optionally including a request body as part of the request). When and only when the request and response are finished, in HTTP 1.1 (but not 1.0) you can make a new request on the same socket (in http 1.0 you open a new socket).
It's possible for the response to "hang" ... this is how web chats work. The server just sits there, hanging onto the open socket, not sending a response until someone sends you a message. The persistent connection to the web server eventually provides a response when/if you receive a chat message.
Similarly, the request can "hang." You can start to send your request data to the server, wait a bit, and then complete the request when you receive additional user input. This mechanism provides better performance than continually creating new http requests on each user input. A server can interpret this stream of data as a stream of distinct inputs, even though that wasn't necessarily the initial intention of the HTTP spec.
HTTP does not support a mechanism to receive part of a request, then send part of a response, then receive more of a request. It's just not in the spec. Once you've begun to receive a response, the only way to send additional information to the server is to use another HTTP request. You can use one that's already open in parallel, or you can open a new one, or you can complete the first request/response and issue an additional request on the same socket (in 1.1).
If you must have asynchronous io on a single socket connection, you might want to consider a different protocol other than HTTP.

NSURLRequest with HTTPBody input stream: Stream sends event before being opened

I want to send a large amount of data to a server using NSURLConnection (and NSURLRequest). For this I create a bound pair of NSStreams (using CFStreamCreateBoundPair(...)). Then I pass the input stream to the NSURLRequest (-setHTTPBodyStream:) and schedule the output stream on the current run loop. When the run loop continues, I get the events to send data and the input stream sends this data to the server.
My problem is, that this only works when the data fits into the buffer between the paired streams. If the data is bigger, then somehow the input stream gets an event (I assume "bytes available") but the NSURLConnection has not yet opened the input stream. This results in an error message printed and the data is not being sent.
I tried to catch this in my -stream:handleEvent: method by simply returning if the input stream is not yet opened, but then my output stream gets a stream closed event (probably because I never sent data when I could).
So my question is: How to use a bound pair of streams with NSURLConnection correctly?
(If this matters: I'm developing on the iOS platform)
Any help is appreciated!
Cheers, Markus

Ok, I kind of fixed this by starting the upload delayed, so that it starts after the NSURLConnection had time to setup its input stream.
It's not what I call a clean solution though, since relying on -performSelector:withObject:afterDelay: seems a bit hacky.
So if anyone else has a solution to this, I'm still open for any suggestions.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse