How is determining body length by closing connection reliable (RFC 2616 4.4.5) - sockets

I can't get one thing straight. The RFC 2616 in 4.4.5 states that Message Length can be determined "By the server closing the connection.".
This implies, that it is valid for a server to respond (e.g. returning a large image) with a response, that has no Content-Length in the header, but the client is supposed to keep fetching till the connection is closed and then assume all data has been downloaded.
But how is a client to know for sure that the connection was closed intentionally by the server? A server app could have crashed in the middle of sending the data and the server's OS would most likely send FIN packet to gracefully close the TCP connection with the client.

You are absolutely right, that mechanism is totally unreliable. This is covered in RFC 7230:
Since there is no way to distinguish a successfully completed,
close-delimited message from a partially received message interrupted
by network failure, a server SHOULD generate encoding or
length-delimited messages whenever possible. The close-delimiting
feature exists primarily for backwards compatibility with HTTP/1.0.
Fortunately most of HTTP traffic today are HTTP/1.1, with Content-Length or "Transfer-Encoding" to explicitly define the end of message.
The lesson is that, a message must have it own way of termination; we cannot repurpose the underlying transport layer's EOF as the message's EOF.
On that note, a (well-formed) html document, or a .gif, .avi etc, does define its own termination; we will know if we received an incomplete document. Therefore it is not so much of a problem to transmit it over HTTP/1.0 without Content-Length.
However, for plain text document, javascript, css etc. EOF is used to marked the end of the document, therefore it's problematic over HTTP/1.0.

Related

Bidirectional communication of Unix sockets

I'm trying to create a server that sets up a Unix socket and listens for clients which send/receive data. I've made a small repository to recreate the problem.
The server runs and it can receive data from the clients that connect, but I can't get the server response to be read from the client without an error on the server.
I have commented out the offending code on the client and server. Uncomment both to recreate the problem.
When the code to respond to the client is uncommented, I get this error on the server:
thread '' panicked at 'called Result::unwrap() on an Err value: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', src/main.rs:77:42
MRE Link
Your code calls set_read_timeout to set the timeout on the socket. Its documentation states that on Unix it results in a WouldBlock error in case of timeout, which is precisely what happens to you.
As to why your client times out, the likely reason is that the server calls stream.read_to_string(&mut response), which reads the stream until end-of-file. On the other hand, your client calls write_all() followed by flush(), and (after uncommenting the offending code) attempts to read the response. But the attempt to read the response means that the stream is not closed, so the server will wait for EOF, and you have a deadlock on your hands. Note that none of this is specific to Rust; you would have the exact same issue in C++ or Python.
To fix the issue, you need to use a protocol in your communication. A very simple protocol could consist of first sending the message size (in a fixed format, perhaps 4 bytes in length) and only then the actual message. The code that reads from the stream would do the same: first read the message size and then the message itself. Even better than inventing your own protocol would be to use an existing one, e.g. to exchange messages using serde.

TCP/IP using Ada Sockets: How to correctly finish a packet? [duplicate]

This question already has answers here:
TCP Connection Seems to Receive Incomplete Data
(5 answers)
Closed 3 years ago.
I'm attempting to implement the Remote Frame Buffer protocol using Ada's Sockets library and I'm having trouble controlling the length of the packets that I'm sending.
I'm following the RFC 6143 specification (https://tools.ietf.org/pdf/rfc6143.pdf), see comments in the code for section numbers...
-- Section 7.1.1
String'Write (Comms, Protocol_Version);
Put_Line ("Server version: '"
& Protocol_Version (1 .. 11) & "'");
String'Read (Comms, Client_Version);
Put_Line ("Client version: '"
& Client_Version (1 .. 11) & "'");
-- Section 7.1.2
-- Server sends security types
U8'Write (Comms, Number_Of_Security_Types);
U8'Write (Comms, Security_Type_None);
-- client replies by selecting a security type
U8'Read (Comms, Client_Requested_Security_Type);
Put_Line ("Client requested security type: "
& Client_Requested_Security_Type'Image);
-- Section 7.1.3
U32'Write (Comms, Byte_Reverse (Security_Result));
-- Section 7.3.1
U8'Read (Comms, Client_Requested_Shared_Flag);
Put_Line ("Client requested shared flag: "
& Client_Requested_Shared_Flag'Image);
Server_Init'Write (Comms, Server_Init_Rec);
The problem seems to be (according to wireshark) that my calls to the various 'Write procedures are causing bytes to queue up on the socket without getting sent.
Consequently two or more packet's worth of data are being sent as one and causing malformed packets. Sections 7.1.2 and 7.1.3 are being sent consecutively in one packet instead of being broken into two.
I had wrongly assumed that 'Reading from the socket would cause the outgoing data to be flushed out, but that does not appear to be the case.
How do I tell Ada's Sockets library "this packet is finished, send it right now"?
To enphasize https://stackoverflow.com/users/207421/user207421 comment:
I'm not a protocols guru, but from my own experience, the usage of TCP (see RFC793) is often misunderstood.
The problem seems to be (according to wireshark) that my calls to the various 'Write procedures are causing bytes to queue up on the socket without getting sent.
Consequently two or more packet's worth of data are being sent as one and causing malformed packets. Sections 7.1.2 and 7.1.3 are being sent consecutively in one packet instead of being broken into two.
In short, TCP is not message-oriented.
Using TCP, sending/writing to socket results only append data to the TCP stream. The socket is free to send it in one exchange or several, and if you have lengthy data to send and message oriented protocol to implement on top of TCP, you may need to handle message reconstruction. Usually, an end of message special sequence of characters is added at the end of the message.
Processes transmit data by calling on the TCP and passing buffers of data as arguments. The TCP packages the data from these buffers into segments and calls on the internet module to transmit each segment to the destination TCP. The receiving TCP places the data from a segment into the receiving user's buffer and notifies the receiving user. The TCPs include control information in the segments which they use to ensure reliable ordered data transmission.
See also https://stackoverflow.com/a/11237634/7237062, quoting:
TCP is a stream-oriented connection, not message-oriented. It has no
concept of a message. When you write out your serialized string, it
only sees a meaningless sequence of bytes. TCP is free to break up
that stream up into multiple fragments and they will be received at
the client in those fragment-sized chunks. It is up to you to
reconstruct the entire message on the other end.
In your scenario, one would typically send a message length prefix.
This way, the client first reads the length prefix so it can then know
how large the incoming message is supposed to be.
or TCP Connection Seems to Receive Incomplete Data, quoting:
The recv function can receive as little as 1 byte, you may have to call it multiple times to get your entire payload. Because of this, you need to know how much data you're expecting. Although you can signal completion by closing the connection, that's not really a good idea.
Update:
I should also mention that the send function has the same conventions as recv: you have to call it in a loop because you cannot assume that it will send all your data. While it might always work in your development environment, that's the kind of assumption that will bite you later.

PlayWS calculate the size of a http call without consuming the stream

I'm currently using the PlayWS http client which returns an Akka stream. From my understanding, I can consume the stream and turn it into a Byte[] to calculate the size. However, this also consumes the stream and I can't use it anymore. Anyway around this?
I think there are two different aspects related to the question.
You want to know the size of the server response in advance to prepare buffer. Unfortunately there is no guaranteed way to do this. HTTP 1.1 spec explicitly allows transfer mode when the server does not know the size of the response in advance via chunked transfer encoding. See also quote from 3.3.1. Transfer-Encoding:
A recipient MUST be able to parse the chunked transfer coding
(Section 4.1) because it plays a crucial role in framing messages
when the payload body size is not known in advance.
Section 3.3.3. Message Body Length specifies how length of a message body is defined and it besides the aforementioned chunked transfer encoding it also contains quite unhelpful
Otherwise, this is a response message without a declared message
body length, so the message body length is determined by the
number of octets received prior to the server closing the
connection.
This is added for backward compatibility and discouraged from usage but is still legally allowed.
Still in many real world scenarios you can use Content-Length header field that the server may return. However there is a catch here as well: if gzip Content-Encoding is used, then Content-Length will contain size of the compressed body.
To sum up: in general case you can't get the size of the message body in advance before you fully get the server response i.e. in terms of code perform a blocking call on the response. You may try to use Content-Length and it might or might not help in your specific case.
You already have a fully downloaded response (or you are OK with blocking on your StreamedResponse) and you want to process it by first getting the size and only then processing the actual data. In such case you may first use getBodyAsBytes method which returns IndexedSeq[Byte] and thus has size, and then convert it into a new Source using Source.single which is actually exactly what the default (i.e. non-streaming) implementation of getBodyAsSource does.

Ensure Completeness of HTTP Messages

I am currently working on an application that is supposed to get a web page and extract information from its content.
As I learned from my research (or as it seems to me at least), there is no ideal way to determine the end of an HTTP message.
Generally, I found two different ways to do so:
Set O_NONBLOCK flag for the socket and fetch data with recv() in a while loop. Assume that the message is complete and break if it occurs once that there are no bytes in the stream.
Rely on the HTTP Content-Length header and determine the end of the message with it.
Both ways don't seem to be completely safe to me. Solution (1) could possibly break the recv loop before the message was completed. On the other hand, solution (2) requires the Content-Length header to be set correctly.
What's the best way to proceed in this case? Can I always rely on the Content-Length header to be set?
Let me start here:
Can I always rely on the Content-Length header to be set?
No, you can't. Content-Length is an optional header. However, HTTP messages absolutely must feature a way to determine their body length if they are to be RFC-compliant (cf RFC7230, sec. 3.3.3). That being said, get ready to parse on chunked encoding whenever a content length isn't specified.
As for your original problem: Ensuring the completeness of a message is actually something that should be TCP's job. But as there are such complicated things like message pipelining around, it is best to check for two things in practice:
Have all reads from the network buffer been successful?
Is the number of the received bytes identical to the predicted message length?
Oh, and as #MartinJames noted, non-blocking probably isn't the best idea here.
The end of a HTTP response is defined:
By the final (empty) chunk in case Transfer-Encoding chunked is used.
By reaching the given length if a Content-length header is given and no chunked transfer encoding is used.
By the end of the TCP connection if neither chunked transfer encoding is used not Content-length is given.
In the first two cases you have a well defined end so you can verify that the data were fully received. Only in the last case (end of TCP connection) you don't know if the connection was closed before sending all the data. But usually you get either case 1 or case 2.
To make your life easier, you might want to provide
Connection: close
header when making HTTP request - than web-server will close connection after giving you the full page requested and you will not have to deal with chunks.
It is only a viable option if you only are interested in this single page, and will not request additional resources (script files, images, etc) - in latter case this will be a very inefficient solution for both your app and the server.

How do I decode a websocket packet?

I'm using Wireshark packet analyzer & when I filter for all "Websocket" packets I see what I am sending /receiving to the host. When I check individual packets mine always show as [MASKED], but you can 'Umask Payload' which shows the data in clear text that looks like this:
<IC sid="52ccc752-6080-4668-8f55-662020d83979" msqid="120l93l9l114l30l104"/>
However, if I 'Follow TCP stream & look at that same packet, the data shows up as encoded in some way like this:
....K#....../...y#..|...}...f...s...~...}...{G..r...kN.."G..z...r...'...'...z...d.
The problem is all Websocket packets I receive from the host come as encoded, it is NOT SSL & I can't figure out how to decode them, I have no idea what they are even encoded as (but yet my browser can decode it).
I assume that whatever method they are coming back to me as encoded data is the same method that my data is encoded when I use 'Follow TCP stream'.
Can someone please help me figure out how to decode the data the host is sending me? See host data below
~.^jVpZc9y4Ef4ryFQ5+yJpeB+JJJdmNPI6G++mrN249kkFkuChIQmG5Fgj//p0AyAJypzxyi5T6P76
QKPRuHz9cUu6IrlZuVYcx75rXXpGYFw6nhdcBqnrXnqeZVhGEtihH65Id7NKWEoPZb8iVfc/FDQt
owztMixN0yltozQNZ3V7ncZkbxrAXZE8vFkZK2g66msJchLIjyuoiWQmvvyApGUY+JsJKPGLkrIF
IHcFALVJNXtTWsl9adMDPlAtQ1AZME0XvoFsShDz5McVn0J6y2z5ceTHlB9pnEltheQVEllIXiGR
z7Ifz6Cz4c2h6XkDLTDUFlkOQYuk/5EUimTnIykUyc5HoeTJjlHVMgWPwifv++Yf6/XLy8uVadlp
Sbs8zml/FfMKAKA4Z2WzLuqEHa/yviqBCEZXJJXeUzC25c2rcIhAEM1LyzBt8jtvtp8+kUee9i+0
ZWTL2+aKkLuyJJ8R2pHPrGPtV5ZcgRIXNVLoF6vh62tpkToy9LIzexnxvRydWEY8lhGPZRxjGccY
IDBEezkMsZSLlRZLtmQQYhm8WCBvr2lAMhFVyDqPpKDmPy1Pi5KtSGaM4Xrlh/aFRV3Rs3Uj+VdN
3rw/QJ9u3v3xuPv8DhSsUw/+ocHtdeKRDNz0wF4GfjpesJrM+CQDx5ACHtFkHdG6Zq159dxkQLPO
jxFa8Ucl7hsl1l9Sss5518vRPa/Ovupe0r+i7qXnTzT5ytq+6Io6e5LiSybMtzacUzbK4ivDZFzo
tmm8UeL+NUeBAKNYsa5jdcbay5TR/tCymZ/rBAYxCbWsuP2ZlSUn7/787Y/Pv9592r27IB9/qsi7
T3+KFklbXpHu0DS87UnPaHVBICKkoq/kI8EeEEif9zI7UFsxU/UCzpGEI4bUjCUT8AsrwWlGek6e
eVGTQ3dBNFHyN5VwSQhwc3I4kA4DN5Ct4oL0OWvZ3yYS+IfTFI0moDt7P2Pl9KvkRYzVGgvI9U89
6YAq2ClvCc1YNyn+gnYm+bxIEsD2kHCMJPS1e080KO1/6sih+Z6W06ZhNbr1HatmL5ND9+g6yThP
wASt950KJ434oUcH2o6V6YT/lcMAcU4imlwQWifDwEjuXUW/gb6pMx9ayI+piYOeSIvIuBoZW34o
EwxMxOBv37N2EvrXoYOAcfg74T+Squg6ESDgVIc413kll8GbaB+E29DPkfI7LfdIkip4PWEfmYx8
ScENzUXax/nEmPzbvDKFWqcmUCxRuBxjqFy+O1WudWoBwiY5TD0Hlkmojz585KKkVVExRaGKYzV9
rGQtBRExF1nF+4LXa5iv6w55auZ6b/h9fqgiDXE7TAuh3ZfK/8uroj+h/CvyziqfEIvKH5sifiUP
kFyn3EfAefdHxNxCqCyUjDWvJ7R/+/btrO6BP9fsKM05j/en3EbeebdHxFy5pbR/weyj5J5nJ0y8
AOCshREwN4B1XSzBOe/5Cd0N8s4qnxBvtKuw/0yr6nR4csk9a0HHLMYfKgdLTo1sJphnDWiQuX5X
6n+gRXtKfYq8s9onxFx5MMSnaV7Jpmj7HEr2CSuRYp81NAMt2trmjLWwBpywEiP70Jw1omPeDPhg
5GSs4h9EKl6M05Cmd3V2UjNF3lndE2JxiH+pYf/TndC+F8yz6jXIYvH5Nz1k+Qn1JfLOap8Qc+We
ch7Wt1OuA+u84wNgOeiPL7ACnyptyDtf2kbEYjp+grX0hO4Kl9lzqkfAYkXYwJZhT/44legRsp9+
kOkz0NyKvRrOLg3vaHmqdir2+fKpgxZXRhxd2OjkcEok20Pb0+JU0GLJ/eGYv8UtDg6sCjWDulSe
7B4CniIA/Gh90GHLtvietafMIO+8hRHxJofVpuj3HPt6Mo17ZJ81MCGWty6HumOnNkadYJ6fJRNk
sXY8yOuI5eUHeeeXnxGxHJ0t5/uCkQ9Fe2qgY4E4n1ETZDiYpzaYckkWSlMZXpEYF6Z5YVoXpn1h
Oheme2F6citrCjH39jp39B2uOd0TfKBdI05AePRri07f5Xb8UCfrDBBXVWNr/RSCRZaV36NzOO0u
oB/hGJnjeW5BAKgLEls8NM7qmMIfWlhwvsf/UsR7OEVULOLJK4GT03eSe0AsCkIdGAQXhGAyL/Qn
hipWfYfuBHkB/yd9qWWY3+6WpeAr8JfM1LydzzAJx22zhl7nzu114ZK9J4cYciI6RBEOT+GpLCgg
C55N8jy7XLjES4VLPBAfmLw8G09Jz3COKnyyN2XaFKAzO7Cux3tGeQdVyAutZ3mn9jwIFv7t9d4y
yd4CU5sVNmxowHnstzSd3UcV/aGGxR02aqWwvj50a2is9RuPdZG4pm/aoed478vuxvw7rZp/Vv1N
gLZANep3FvR3YKApYdcGB+tM6e963jI00a0TBqW67N4XyQ3sI6/Q1Cce4TltIxU74l8TIxzfXncx
yfE67hg+bOytq250jw/iR97FGsdduLt/gNKbOwIpfuR1rHF0LN+59+WtrCaHkTxuLfwjj6LG0RY/
6kR6NIwwxGNsbgkLAVjwVehHZCHkNg/37m4rRwrlPA/DUfgzpKd7VrjSl/BO8BzZCsSlc2HP5KwZ
T3gd24YbYKn2dGTq6/1Lg1lrFjMqWnd3Gx/jSc0ZT9i7e9ia2x3ICd7O2W0eDHmTaxxT8aNudI+g
I8ToUmfGs/URo6K327t7f4clWvYofNg+7OQtrtYHV9dC/ZmcM/Mz0PuQzUYlE3LW1ts6d3JOajxh
3XZtx7bk5DaOgRVu7h5kAmstT/clc/UeZTLPdtBy1MwdPUuEL574kQ8NqMXzxDvCrH+JylbbNobX
B8gewxBaVLYa8iHCmPHMWcsask4g7YGHecaEL5bnBRtPPUCMkWCz/GRqHDwPdTJX7xETOr3dxtxt
5OOEcfTFj3yjgDEy7u8fQvlUoekUXj84MNb36nHiaO683b16o5gi8SzkAtsW9p5lPL3At/BRQ2Wy
729hjj2Hw9wUcrMxehY9cu3A2NjqseMY2hv/fiPfPMBP7z7EmRrN+ideQEavn2e1QDyLTBbMGW+W
u5HMQce7uzPkA4rWmus09QzZ23qP9o4ut1dRklmwn/V27+tREk8dR8f0/PBePnVM2RMLnYbpbjAL
4ll9iUM9l/bCszTxUsNTzyCCJ/y09JjFvs6LVQ3B/JFPJJoF0XfL9bydhwW+w6IuZklcMto+wYZB
7sRwqiiSOinhdIGFnZelhnIm2gDzJ1KaalBvTh/g4BBvcK+n2uA8bMC+0h529WAdVv5qOOPg5NJ5
UTlurXGm5QUubOqcjktRy3kFW98eHzFGP3BVmjGUYVyDMlqxp4ZmQr0iG4ocg9UM1D8V8ShiK95I
sAbCYA3nfA0brhqp0V5jBG8YSgUWArG5mQUPC8JEHaDuQKw15BQIjehoyEEa+suORa+hLE10QBkD
ShFwYau6rKF46lLHc1zeovIgB6VkqdKH69xIbqeTBNaCuOTdFDesB+KNVDWxY6xOol7rGNYFJTWS
wBfxOqCaniY2aA7HlFEE8LXkNBEPWyp5cAnGlCih85OkoRNHGxA13CQVdXPoR3kTdWbqIhMXbdHb
ST90WHcKF3I5FWZhxFV71mVc1IcAKghEJuGTk7hgi/Yggqs0vmNiWikKdLihXffC20QdsnBc9lmL
CTE87kmGr4ZhM9y7gGzLMogAawcaVrOqqIvRh1jNtKfuUCmKmmJP/GXUjYVO0MrxPharXUWPk/NY
4gRh0OwPpkbEvDoooqVU67UCC92I3A/9sUTyPqk6FE2+4N5D44x3N8Y0lk8R73teTQdV2EpHfHMA
4nCX5H6vXisXuE/R2XQ0bs8YCYRNnXVTW+wrh9EaqI7Ym92P7+wAEWcgZ+K18lJCHLzJaqbFMMeT
9CTwyOgcZdpLqEaew3SgFSwAN5wqxyL4bQHwiVHop4RU4vclc3A2Ge11srEIg0nPaJwPQNVc8usV
X0/J2NuO0a5ImI4UVwuPv3z89enu8+ffvpCvFKYgFJq2nXyc2LiG3l7H+R6yH/9PjGGw0LfTyLGN
0Ehdzw4DM6apZaXM8pMA/+NMfLNyXc+IAwcgIaNBEDhmStPYMSPT9izHSMA5QCUA8tMoCFxGHWpa
phn5lhVYNAlTljoueiItwy8ft7f/Bw==
Client to server data is XORed with a mask (included in the dataframe). Some people suggest this is in order to throw off bad caching mechanisms responding to new websocket requests with server messages from older sessions. The masking makes sure that even messages containing identical data will appear differently to applications that do not understand websockets.
Also note that there are many different size options for the headers themselves.
Refer to RFC 6455 Section 5 which defines the masking/unmasking process for payloads sent from the client to the server.
https://www.rfc-editor.org/rfc/rfc6455
If you find any freeware VBA code to do the job of forming packets let me know! :-)