What is meant by record or data boundaries in the sense of TCP & UDP protocol? - sockets

I am learning to sockets and found the word Data OR Record Boundaries in SOCK_SEQPACKET communication protocol? Can anyone explain in simple words what is Data boundary and how the SOCK_SEQPACKET is different from SOCK_STREAM & SOCK_DGRAM ?

This answer https://stackoverflow.com/a/9563694/1076479 has a good succinct explanation of message boundaries (a different name for "record boundaries").
Extending that answer to SOCK_SEQPACKET:
SOCK_STREAM provides reliable, sequenced communication of streams of data between two peers. It does not maintain message (record) boundaries, which means the application must manage its own boundaries on top of the stream provided.
SOCK_DGRAM provides unreliable transmission of datagrams. Datagrams are self-contained capsules and their boundaries are maintained. That means if you send a 20 byte buffer on peer A, peer B will receive a 20 byte message. However, they can be dropped, or received out of order, and it's up to the application to figure that out and handle it.
SOCK_SEQPACKET is a newer technology that is not yet widely used, but tries to marry the benefits of both of the above. That is, it provides reliable, sequenced communication that also transmits entire "datagrams" as a unit (and hence maintains message boundaries).
It's easiest to demonstrate the concept of message boundaries by showing what happens when they're neglected. Beginners often post client code like this here on SO (using python for convenience):
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(('192.168.4.122', 9000))
s.send(b'FOO') # Send string 1
s.send(b'BAR') # Send string 2
reply = s.recv(128) # Receive reply
And server code similar to this:
lsock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
lsock.bind(('', 9000))
lsock.listen(5)
csock, caddr = lsock.accept()
string1 = csock.recv(128) # Receive first string
string2 = csock.recv(128) # Receive second string <== XXXXXXX
csock.send(b'Got your messages') # Send reply
They don't understand then why the server hangs on the second recv call, while the client is hung on its own recv call. That happens because both strings the client sent (may) get bundled together and received as a single unit in the first recv on the server side. That is, the message boundary between the two logical messages was not preserved, and so string1 will often contain both chunks run together: 'FOOBAR'
(Often there are other timing-related aspects to the code that influence when/whether that actually happens or not.)

Related

TCP/IP using Ada Sockets: How to correctly finish a packet? [duplicate]

This question already has answers here:
TCP Connection Seems to Receive Incomplete Data
(5 answers)
Closed 3 years ago.
I'm attempting to implement the Remote Frame Buffer protocol using Ada's Sockets library and I'm having trouble controlling the length of the packets that I'm sending.
I'm following the RFC 6143 specification (https://tools.ietf.org/pdf/rfc6143.pdf), see comments in the code for section numbers...
-- Section 7.1.1
String'Write (Comms, Protocol_Version);
Put_Line ("Server version: '"
& Protocol_Version (1 .. 11) & "'");
String'Read (Comms, Client_Version);
Put_Line ("Client version: '"
& Client_Version (1 .. 11) & "'");
-- Section 7.1.2
-- Server sends security types
U8'Write (Comms, Number_Of_Security_Types);
U8'Write (Comms, Security_Type_None);
-- client replies by selecting a security type
U8'Read (Comms, Client_Requested_Security_Type);
Put_Line ("Client requested security type: "
& Client_Requested_Security_Type'Image);
-- Section 7.1.3
U32'Write (Comms, Byte_Reverse (Security_Result));
-- Section 7.3.1
U8'Read (Comms, Client_Requested_Shared_Flag);
Put_Line ("Client requested shared flag: "
& Client_Requested_Shared_Flag'Image);
Server_Init'Write (Comms, Server_Init_Rec);
The problem seems to be (according to wireshark) that my calls to the various 'Write procedures are causing bytes to queue up on the socket without getting sent.
Consequently two or more packet's worth of data are being sent as one and causing malformed packets. Sections 7.1.2 and 7.1.3 are being sent consecutively in one packet instead of being broken into two.
I had wrongly assumed that 'Reading from the socket would cause the outgoing data to be flushed out, but that does not appear to be the case.
How do I tell Ada's Sockets library "this packet is finished, send it right now"?
To enphasize https://stackoverflow.com/users/207421/user207421 comment:
I'm not a protocols guru, but from my own experience, the usage of TCP (see RFC793) is often misunderstood.
The problem seems to be (according to wireshark) that my calls to the various 'Write procedures are causing bytes to queue up on the socket without getting sent.
Consequently two or more packet's worth of data are being sent as one and causing malformed packets. Sections 7.1.2 and 7.1.3 are being sent consecutively in one packet instead of being broken into two.
In short, TCP is not message-oriented.
Using TCP, sending/writing to socket results only append data to the TCP stream. The socket is free to send it in one exchange or several, and if you have lengthy data to send and message oriented protocol to implement on top of TCP, you may need to handle message reconstruction. Usually, an end of message special sequence of characters is added at the end of the message.
Processes transmit data by calling on the TCP and passing buffers of data as arguments. The TCP packages the data from these buffers into segments and calls on the internet module to transmit each segment to the destination TCP. The receiving TCP places the data from a segment into the receiving user's buffer and notifies the receiving user. The TCPs include control information in the segments which they use to ensure reliable ordered data transmission.
See also https://stackoverflow.com/a/11237634/7237062, quoting:
TCP is a stream-oriented connection, not message-oriented. It has no
concept of a message. When you write out your serialized string, it
only sees a meaningless sequence of bytes. TCP is free to break up
that stream up into multiple fragments and they will be received at
the client in those fragment-sized chunks. It is up to you to
reconstruct the entire message on the other end.
In your scenario, one would typically send a message length prefix.
This way, the client first reads the length prefix so it can then know
how large the incoming message is supposed to be.
or TCP Connection Seems to Receive Incomplete Data, quoting:
The recv function can receive as little as 1 byte, you may have to call it multiple times to get your entire payload. Because of this, you need to know how much data you're expecting. Although you can signal completion by closing the connection, that's not really a good idea.
Update:
I should also mention that the send function has the same conventions as recv: you have to call it in a loop because you cannot assume that it will send all your data. While it might always work in your development environment, that's the kind of assumption that will bite you later.

Confusion with AF_INET, with SOCK_RAW as the socket type, V/S AF_PACKET, with SOCK_DGRAM and SOCK_RAW as the socket type

I am quite new to network programming and have been trying to wrap my head around this for quite sometime now. After going through numerous resources over the internet, I have the below conclusion and following it the confusion.
Conclusion 1:
When we are talking about creating a socket as :
s = socket(AF_INET, SOCK_RAW, 0);
we are basically trying to create a raw socket. With a raw socket created this way one would be able to bypass TCP/UDP layer in the OSI stack. Meaning, when the packet is received by the application over this socket, the application would have the packet containing the network layer (layer 3) headers wrapping the layer 2 headers wrapping the actual data. So the application is free to process this packet, beyond layer 3, in anyway it wants to.
Similarly, when sending a packet through this socket also, the application is free to handle the packet creation till layer 4 and then pass it down to layer 3, from which point on the kernel would handle things.
Conclusion 2: When we are talking about creating a socket as :
s = socket(AF_PACKET, SOCK_RAW, 0);
we are again trying to create a raw socket. With a raw socket created this way one would be able to bypass all the layers of the OSI altogether.
A pure raw packet would be available to the user land application and it is free to do whatever it wants with that packet. A packets received over such a socket would have all the headers intact and the application would also have access to all of those headers.
Similarly, when sending data over such a socket as well, the user application is the one that would have to handle everything with regards to the creation of the packet and the wrapping of the actual data with the headers of each layer before it is actually placed on the physical medium to be transmitted across.
Conclusion 3: When we are talking about creating a socket as :
s = socket(AF_PACKET, SOCK_DGRAM, 0);
we are again trying to create a raw socket. With a raw socket created this way one would be able to bypass data link layer (layer 2) in the OSI stack. Meaning, when a packet over such a socket is received by the user land application, data link layer header is removed from the packet.
Similarly, while sending a packet through this socket, a suitable data link layer header is added to the packet, based on the information in the sockaddr_ll destination address.
Now below are my queries/points of confusion:
Are the conclusions that I have drawn above about raw sockets correct ?
I did not quite clearly understand the conclusion 3 above. Can someone please explain ? Like, does it mean that when the user land application receives a packet through this socket, it is only the data link layer headers that would have been handled by the kernel? And so the packet would be like the message wrapped with directly the layer 3 headers and wrapped subsequently by the layers above it?
If the conclusions drawn above are correct, conclusion 1 and conclusion 2 still make sense. But if conclusion 3 above (and the speculations around it in 2 above) are correct, when exactly would any application ever need to do that ?
Some resources that I have referred to trying to understand the above:
https://docs.freebsd.org/44doc/psd/21.ipc/paper.pdf
https://sock-raw.org/papers/sock_raw
https://www.quora.com/in/Whats-the-difference-between-the-AF_PACKET-and-AF_INET-in-python-socket
http://www.linuxcertif.com/man/7/PF_PACKET/
http://opensourceforu.com/2015/03/a-guide-to-using-raw-sockets/
'SOCK_RAW' option in 'socket' system call
http://stevendanna.github.io/blog/2013/06/23/a-short-sock-raw-adventure/
https://www.intervalzero.com/library/RTX/WebHelp/Content/PROJECTS/Application%20Development/Understanding_Network/Using_RAW_Sockets.htm
You got quite closer to their real explanation. Here I've got something to tell you what I think you're missing or wrong about.
First, for s = socket(AF_INET, SOCK_RAW, 0);, when packet is received over such socket, it will contain an IP header always. If IP_HDRINCL is not enabled, for sending, the packet must contain the IP header, the TCP/IP stack will not generate this for you. All other upper layers can be received by this socket.
Secondly, s = socket(AF_PACKET, SOCK_RAW, 0);:
This is a special type of Raw Socket and called Packet-socket in Linux system. This type of socket allows to send and receive packet at OSI layer 2 that's why APIs used for such socket are referred to as Link Layer API. Any protocol can be implemented on the top of physical layer by using this socket. Interestingly, we can also interact with the packet's trailer with this socket what though we don't frequently need.
Thirdly, In case of s = socket(AF_PACKET, SOCK_DGRAM, 0);, your conclusion is right. In this type of Packet Socket, you don't need to think about Ethernet header. It's a bit upper layer than previous type.
So, we can say that the main distinction amongst these types of sockets is their possibility of access. To summarize:
Raw-Socket access:
| Layer 3 header | Layer 4 header | Payload |
Packet-Socket access:
| Layer 2 header | Layer 3 header | Layer 4 header | Payload | Layer 2 trailer |

Receiving data from lua tcp socket without data size

I've been working in a socket tcp connection to a game server. The big problem here is that the game server send the data without any separators - since it sends the packet lenght inside the data -, making impossible to use socket:receive("*a") or "*l". The data received from the server does not have a static size and are sent in HEX format. I'm using this solution:
while true do
local rect, r, st = socket.select({_S.sockets.main, _S.sockets.bulle}, nil, 0.2)
for i, con in ipairs(rect) do
resp, err, part = con:receive(1)
if resp ~= nil then
dataRecv = dataRecv..resp
end
end
end
As you can see, I can only get all the data from the socket by reading one byte and appending it to a string, not a good way since I have two sockets to read. Is there a better way to receive data from this socket?
I don't think there is any other option; usually in a situation like this the client reads a packet of specific length to figure out how much it needs to read from the rest of the stream. Some protocols combine new line and the length; for example HTTP uses line separators for headers, with one of the headers specifying the length of the content that follows the headers.
Still, you don't need to read the stream one-by-one character as you can switch to non-blocking read and request any number of characters. If there is not enough to read, you'll get partially read content plus "timeout" signaled, which you can handle in your logic; from the documentation:
In case of error, the method returns nil followed by an error message
which can be the string 'closed' in case the connection was closed
before the transmission was completed or the string 'timeout' in case
there was a timeout during the operation. Also, after the error
message, the function returns the partial result of the transmission.

Parsing ByteString from Socket fails

We are writing a message broker in Haskell (HMB). Therefore messages have to be parsed (Data.Binary) after they are received from socket (Network.Socket). We've been testing on loopback (localhost) so far - for producing and parsing messages. This worked quiet well. If we benchmark by producing messages from another machine we are facing problems: Suddenly the parser does not have enough bytes to parse.
The first 4 bytes of each message defines the length of the message and thus describes the message to be parsed. As hinted above, we do parsing with Data.Binary - so this is lazy. For testing purposes we switched parsing of the first 4 bytes to strict by using the cereal library. This the same problem. We now even tried to completely parse the requests with cereal only and the problem also remains.
In the code you'll see that we do threading. However, we also tried without a channel (single threaded setup) but this didn't solve the problem either.
Here is a part of the code (Thread1) where the received bytes are written to a channel to be further consumed/parsed. (As mentioned, nothing changes if we omit channeling and directly parse input):
runConnection :: (Socket, SockAddr) -> RequestChan -> Bool -> IO()
runConnection conn chan False = return ()
runConnection conn chan True = do
r <- recvFromSock conn
case (r) of
Left e -> do
handleSocketError conn e
runConnection conn chan False
Right input -> do
threadDelay 5000 -- THIS FIXES THE PROBLEM!?
writeToReqChan conn chan input
runConnection conn chan True
Here is the part (Thread2) where input is beeing parsed:
runApiHandler :: RequestChan -> ResponseChan -> IO()
runApiHandler rqChan rsChan = do
(conn, req) <- readChan rqChan
case readRequest req of -- readRequest IS THE PARSER
Left (bs, bo, e) -> handleHandlerError conn $ ParseRequestError e
Right (bs, bo, rm) -> do
res <- handleRequest rm
case res of
Left e -> handleHandlerError conn e
Right bs -> writeToResChan conn rsChan bs
runApiHandler rqChan rsChan
Now I figured out, that if the process of parsing is delayed a bit (see threadDelay in the first code block), everything works fine. Which basically means, the parser doesn't wait for bytes received from the socket.
Why is that? Why does the parser not wait for the socket the have enough bytes? Is there a general mistake in our setup?
I would bet that the problem has nothing to do with the parser but is instead due to the blocking semantics of UNIX sockets.
While a loopback
interface will likely pass the packet directly from the sender to the receiver,
an Ethernet interface may need to break up the packet to fit in the Maximum
Transmission Unit (MTU) of the link. This is known as packet fragmentation.
The len
argument to the recv system call is merely
the upper bound on the received length (e.g. the size of the target buffer); the
call may produce less data than you ask for. To quote the manpage,
If no messages are available at the socket, the receive calls wait for a
message to arrive, unless the socket is nonblocking (see fcntl(2)), in which
case the value -1 is returned and the external variable errno is set to
EAGAIN or EWOULDBLOCK. The receive calls normally return any data
available, up to the requested amount, rather than waiting for receipt of
the full amount requested.
For this reason, you may need multiple recv calls to retrieve the entire packet. Your example works if you delay the recv as the operating system can reassemble the original packet since all fragments have arrived by the time it is requested.
As meiersi pointed out, there are a variety of streaming I/O libraries that have developed in the Haskell world for solving this problem, among others. These include pipes, conduit, io-streams, and others. Depending upon your goals, this may be a natural way to handle this issue.
You might want to try the socket support in conduit-extra combined with binary-conduit to properly handle the parsing of the chunked streaming, which happens due to the reasons pointed out by bgamari.
First of all, consider yourself lucky to observe this. On many platforms perhaps only one out of a thousand packets exhibit this behaviour, causing a lot of such (sorry) bad networking code to fail seldom and randomly.
The problem is that you start processing before the data is ready. Instead of the threadDelay (which introduces a permanent delay and might not be long enough in all cases), the solution is to make sure you have at least one item/message/packet to process before you start processing it. Your protocol where the first 32bit word contains the length is perfect for this. Read data until you have at least 4 bytes (the length). Then read data until you have the required number of bytes. If any calls to recvFromSock returns less than the required number, call it again to get some more. Remember to also handle the case of 0 bytes, this means the other party closed the connection.
I have implemented this for a similar protocol (SMPP, packets also starts with the length) and it works perfectly.

Perl - Sending multiple data through sockets

There's a thing I'm wondering about when it comes to socket programming in Perl. I'm trying to send two variabels through my socket. It works, I can send both but I want to receive them one by one. Let me show you my code and the output I get:
SERVER
my $var1 = 200;
chomp($var1);
$socket->send($var1);
my $var2 = 300;
chomp($var2);
$socket->send($var2);
CLIENT
$socket->recv(my $var1, 4000);
chomp($var1);
$socket->recv(my $var2, 4000);
chomp($var2);
print "From server: My height is: $var1 cm, weight is: $var2 kg\n";
Well, my expected output should be: From server: My height is: 400 cm, weight is: 300 cm.
Instead, my output looks like this: From server: My height is: 400300 cm, weight is:
Well, I can't see why my code is wrong. Shouldnt I be able to receive data one by one like this? How would I eventually fix this to receive the data correctly?
Short answer: Use datagram sockets or implement a communication protocol that delimits distinct messages.
You ask:
Shouldnt I be able to receive data one by one like this?
You can do that easily on datagram sockets like UDP: IO::Socket::INET->new(Type => SOCK_DGRAM, ...). There the writer/reader protocol is transmitting and receiving discrete packets of data, and send/recv will do what you want.
Stream sockets like TCP, on the other hand, have a continuous bytestream model. Here you need some sort of record separator or initial transmission of message size or similar. I suspect tis is what you are using.
Stream sockets are not unlike plain files, which appear as "bytestream" IO handles. If a file contained the string "300400", without any newlines, how would a reader of that file know that this string was actually two records instead of one (or six or five or an incomplete record, etc.)?