I have this code:
class SrvHandler(asyncore.dispatcher_with_send):
def handle_read(self):
data = self.recv(1024)
if data:
data = str(data, 'utf-8')
print("Received: " + data)
Now, I have a VBasic application. This application has 2 thread. This two thread works at the same time but uses same connection (static connection declared). Sometimes, this 2 threads sends data at the same time or at lease few delay between miliseconds. But the problem is, Python3 receive this 2 different data at the same time like one single line of data.
For example:
VBasic Thread 1: socket.write("Hello")
VBasic Thread 2: socket.write("12345")
Python3 Data:
Expected Result:
First recv: Hello
Second recv: 12345
Actual Result:
First recv: Hello12345
Why Python3 acts like this? What can i do to prevent this mix? What is the reason of this behavior? Is it by design? How can i prevent this? On python3 or VBasic (Client or Server)
Hope this example and code is informative enough.
This is expected behaviour for a streaming service like TCP.
TCP cannot transfer any message longer than one byte.
If you want to transfer any structure longer than one byte, you need a protocol on top of TCP - maybe put a null or something at the end of your strings and send that as well. At the receiving end, concatenate all received bytes until you find a null.
Related
I'm working with a Camel flow that uses a Netty TCP socket consumer to receive messages from a client program (which is outside of my control). The client should be opening a socket, sending us one message, then closing the socket, but we've been seeing cases where instead of one message Camel is "splitting" the text stream into two parts and trying to process them separately.
So I'm trying to figure out, since you can re-use the same socket for multiple Camel messages, but TCP sockets don't have a built-in concept of "frames" or a standard for message delimiters, how does Camel decide that a complete message has been received and is ready to process? I haven't been able to find a documented answer to this in the Netty component docs (https://camel.apache.org/components/3.15.x/netty-component.html), although maybe I'm missing something.
From playing around with a test script, it seems like one answer is "Camel assumes a message is complete and should be processed if it goes more than 1ms without receiving any input on the socket". Is this a correct statement, and if so is this behavior documented anywhere? Is there any way to change or configure this behavior? Really what I would prefer is for Camel to wait for an ETX character (or a much longer timeout) before processing a message, is that possible to set up?
Here's my test setup:
Camel flow:
from("netty:tcp://localhost:3003")
.log("Received: ${body}");
Python snippet:
DELAY_MS = 3
def send_msg(sock, msg):
print("Sending message: <{}>".format(msg))
if not sock.sendall(msg.encode()) is None:
print("Message failed to send")
time.sleep(DELAY_MS / 1000.0)
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
print("Using DELAY_MS: {}".format(str(DELAY_MS)))
s.connect((args.hostname, args.port))
cutoff = int(math.floor(len(args.msg) / 2))
msg1 = args.msg[:cutoff]
send_msg(s, msg1)
msg2 = args.msg[cutoff:]
send_msg(s, msg2)
response = s.recv(1024)
except Exception as e:
print(e)
finally:
s.close()
I can see that with DELAY_MS=1 Camel logs one single message:
2022-02-21 16:54:40.689 INFO 19429 --- [erExecutorGroup] route1 : Received: a long string sent over the socket
But with DELAY_MS=2 it logs two separate messages:
2022-02-21 16:56:12.899 INFO 19429 --- [erExecutorGroup] route1 : Received: a long string sen
2022-02-21 16:56:12.899 INFO 19429 --- [erExecutorGroup] route1 : Received: t over the socket
After doing some more research, it seems like what I need to do is add a delimiter-based FrameDecoder to the decoders list.
Setting it up like this:
from("netty:tcp://localhost:3003?sync=true"
+ "&decoders=#frameDecoder,#stringDecoder"
+ "&encoders=#stringEncoder")
where frameDecoder is provided by
#Bean
ChannelHandlerFactory frameDecoder() {
ByteBuf[] ETX_DELIM = new ByteBuf[] { Unpooled.wrappedBuffer(new byte[] { (char)3 }) };
return ChannelHandlerFactories.newDelimiterBasedFrameDecoder(1024, ETX_DELIM,
false, "tcp");
}
seems to do the trick.
On the flip side though, it seems like this will hang indefinitely (or until lower-level TCP timeouts kick in?) if an ETX frame is not received, and I can't figure out any way to set a timeout on the decoder, so would still be eager for input if anyone knows how to do that.
I think the default "timeout" behavior I was seeing might've just been an artifact of Netty's read loop speed -- How does netty determine when a read is complete?
I'm trying to create a server that sets up a Unix socket and listens for clients which send/receive data. I've made a small repository to recreate the problem.
The server runs and it can receive data from the clients that connect, but I can't get the server response to be read from the client without an error on the server.
I have commented out the offending code on the client and server. Uncomment both to recreate the problem.
When the code to respond to the client is uncommented, I get this error on the server:
thread '' panicked at 'called Result::unwrap() on an Err value: Os { code: 11, kind: WouldBlock, message: "Resource temporarily unavailable" }', src/main.rs:77:42
MRE Link
Your code calls set_read_timeout to set the timeout on the socket. Its documentation states that on Unix it results in a WouldBlock error in case of timeout, which is precisely what happens to you.
As to why your client times out, the likely reason is that the server calls stream.read_to_string(&mut response), which reads the stream until end-of-file. On the other hand, your client calls write_all() followed by flush(), and (after uncommenting the offending code) attempts to read the response. But the attempt to read the response means that the stream is not closed, so the server will wait for EOF, and you have a deadlock on your hands. Note that none of this is specific to Rust; you would have the exact same issue in C++ or Python.
To fix the issue, you need to use a protocol in your communication. A very simple protocol could consist of first sending the message size (in a fixed format, perhaps 4 bytes in length) and only then the actual message. The code that reads from the stream would do the same: first read the message size and then the message itself. Even better than inventing your own protocol would be to use an existing one, e.g. to exchange messages using serde.
This question already has answers here:
TCP Connection Seems to Receive Incomplete Data
(5 answers)
Closed 3 years ago.
I'm attempting to implement the Remote Frame Buffer protocol using Ada's Sockets library and I'm having trouble controlling the length of the packets that I'm sending.
I'm following the RFC 6143 specification (https://tools.ietf.org/pdf/rfc6143.pdf), see comments in the code for section numbers...
-- Section 7.1.1
String'Write (Comms, Protocol_Version);
Put_Line ("Server version: '"
& Protocol_Version (1 .. 11) & "'");
String'Read (Comms, Client_Version);
Put_Line ("Client version: '"
& Client_Version (1 .. 11) & "'");
-- Section 7.1.2
-- Server sends security types
U8'Write (Comms, Number_Of_Security_Types);
U8'Write (Comms, Security_Type_None);
-- client replies by selecting a security type
U8'Read (Comms, Client_Requested_Security_Type);
Put_Line ("Client requested security type: "
& Client_Requested_Security_Type'Image);
-- Section 7.1.3
U32'Write (Comms, Byte_Reverse (Security_Result));
-- Section 7.3.1
U8'Read (Comms, Client_Requested_Shared_Flag);
Put_Line ("Client requested shared flag: "
& Client_Requested_Shared_Flag'Image);
Server_Init'Write (Comms, Server_Init_Rec);
The problem seems to be (according to wireshark) that my calls to the various 'Write procedures are causing bytes to queue up on the socket without getting sent.
Consequently two or more packet's worth of data are being sent as one and causing malformed packets. Sections 7.1.2 and 7.1.3 are being sent consecutively in one packet instead of being broken into two.
I had wrongly assumed that 'Reading from the socket would cause the outgoing data to be flushed out, but that does not appear to be the case.
How do I tell Ada's Sockets library "this packet is finished, send it right now"?
To enphasize https://stackoverflow.com/users/207421/user207421 comment:
I'm not a protocols guru, but from my own experience, the usage of TCP (see RFC793) is often misunderstood.
The problem seems to be (according to wireshark) that my calls to the various 'Write procedures are causing bytes to queue up on the socket without getting sent.
Consequently two or more packet's worth of data are being sent as one and causing malformed packets. Sections 7.1.2 and 7.1.3 are being sent consecutively in one packet instead of being broken into two.
In short, TCP is not message-oriented.
Using TCP, sending/writing to socket results only append data to the TCP stream. The socket is free to send it in one exchange or several, and if you have lengthy data to send and message oriented protocol to implement on top of TCP, you may need to handle message reconstruction. Usually, an end of message special sequence of characters is added at the end of the message.
Processes transmit data by calling on the TCP and passing buffers of data as arguments. The TCP packages the data from these buffers into segments and calls on the internet module to transmit each segment to the destination TCP. The receiving TCP places the data from a segment into the receiving user's buffer and notifies the receiving user. The TCPs include control information in the segments which they use to ensure reliable ordered data transmission.
See also https://stackoverflow.com/a/11237634/7237062, quoting:
TCP is a stream-oriented connection, not message-oriented. It has no
concept of a message. When you write out your serialized string, it
only sees a meaningless sequence of bytes. TCP is free to break up
that stream up into multiple fragments and they will be received at
the client in those fragment-sized chunks. It is up to you to
reconstruct the entire message on the other end.
In your scenario, one would typically send a message length prefix.
This way, the client first reads the length prefix so it can then know
how large the incoming message is supposed to be.
or TCP Connection Seems to Receive Incomplete Data, quoting:
The recv function can receive as little as 1 byte, you may have to call it multiple times to get your entire payload. Because of this, you need to know how much data you're expecting. Although you can signal completion by closing the connection, that's not really a good idea.
Update:
I should also mention that the send function has the same conventions as recv: you have to call it in a loop because you cannot assume that it will send all your data. While it might always work in your development environment, that's the kind of assumption that will bite you later.
We are writing a message broker in Haskell (HMB). Therefore messages have to be parsed (Data.Binary) after they are received from socket (Network.Socket). We've been testing on loopback (localhost) so far - for producing and parsing messages. This worked quiet well. If we benchmark by producing messages from another machine we are facing problems: Suddenly the parser does not have enough bytes to parse.
The first 4 bytes of each message defines the length of the message and thus describes the message to be parsed. As hinted above, we do parsing with Data.Binary - so this is lazy. For testing purposes we switched parsing of the first 4 bytes to strict by using the cereal library. This the same problem. We now even tried to completely parse the requests with cereal only and the problem also remains.
In the code you'll see that we do threading. However, we also tried without a channel (single threaded setup) but this didn't solve the problem either.
Here is a part of the code (Thread1) where the received bytes are written to a channel to be further consumed/parsed. (As mentioned, nothing changes if we omit channeling and directly parse input):
runConnection :: (Socket, SockAddr) -> RequestChan -> Bool -> IO()
runConnection conn chan False = return ()
runConnection conn chan True = do
r <- recvFromSock conn
case (r) of
Left e -> do
handleSocketError conn e
runConnection conn chan False
Right input -> do
threadDelay 5000 -- THIS FIXES THE PROBLEM!?
writeToReqChan conn chan input
runConnection conn chan True
Here is the part (Thread2) where input is beeing parsed:
runApiHandler :: RequestChan -> ResponseChan -> IO()
runApiHandler rqChan rsChan = do
(conn, req) <- readChan rqChan
case readRequest req of -- readRequest IS THE PARSER
Left (bs, bo, e) -> handleHandlerError conn $ ParseRequestError e
Right (bs, bo, rm) -> do
res <- handleRequest rm
case res of
Left e -> handleHandlerError conn e
Right bs -> writeToResChan conn rsChan bs
runApiHandler rqChan rsChan
Now I figured out, that if the process of parsing is delayed a bit (see threadDelay in the first code block), everything works fine. Which basically means, the parser doesn't wait for bytes received from the socket.
Why is that? Why does the parser not wait for the socket the have enough bytes? Is there a general mistake in our setup?
I would bet that the problem has nothing to do with the parser but is instead due to the blocking semantics of UNIX sockets.
While a loopback
interface will likely pass the packet directly from the sender to the receiver,
an Ethernet interface may need to break up the packet to fit in the Maximum
Transmission Unit (MTU) of the link. This is known as packet fragmentation.
The len
argument to the recv system call is merely
the upper bound on the received length (e.g. the size of the target buffer); the
call may produce less data than you ask for. To quote the manpage,
If no messages are available at the socket, the receive calls wait for a
message to arrive, unless the socket is nonblocking (see fcntl(2)), in which
case the value -1 is returned and the external variable errno is set to
EAGAIN or EWOULDBLOCK. The receive calls normally return any data
available, up to the requested amount, rather than waiting for receipt of
the full amount requested.
For this reason, you may need multiple recv calls to retrieve the entire packet. Your example works if you delay the recv as the operating system can reassemble the original packet since all fragments have arrived by the time it is requested.
As meiersi pointed out, there are a variety of streaming I/O libraries that have developed in the Haskell world for solving this problem, among others. These include pipes, conduit, io-streams, and others. Depending upon your goals, this may be a natural way to handle this issue.
You might want to try the socket support in conduit-extra combined with binary-conduit to properly handle the parsing of the chunked streaming, which happens due to the reasons pointed out by bgamari.
First of all, consider yourself lucky to observe this. On many platforms perhaps only one out of a thousand packets exhibit this behaviour, causing a lot of such (sorry) bad networking code to fail seldom and randomly.
The problem is that you start processing before the data is ready. Instead of the threadDelay (which introduces a permanent delay and might not be long enough in all cases), the solution is to make sure you have at least one item/message/packet to process before you start processing it. Your protocol where the first 32bit word contains the length is perfect for this. Read data until you have at least 4 bytes (the length). Then read data until you have the required number of bytes. If any calls to recvFromSock returns less than the required number, call it again to get some more. Remember to also handle the case of 0 bytes, this means the other party closed the connection.
I have implemented this for a similar protocol (SMPP, packets also starts with the length) and it works perfectly.
I'm sending data over an UDP socket and receive it in a loop with read().
The input data looks like this:
String 1
String 2
String 3
....
I write data out with send(), each string after each other (in a loop).
How do I make sure that I can reconstruct the data on the receive end in the correct fashion (as I put the strings in)?
The received data can be split anywhere in the middle of the lines like so:
Packet 0: Stri
Packet 1: ng 1
Packet 2: String 2 St
Packet 3: ring 3
...
Do i have to introduce a custom END OF MESSAGE byte sequence to tell? Because EOF won't help here.
I need to be able to tell if a package is corrupted, and where the data blocks that belong together begin and end, as I sent them away beginning with S and ending with the Number! I can't use TCP because i need broadcast/multicast support.
If you want all messages to arrive, and in the same order they were sent, and to have an "end of message" indication, maybe TCP is better :-)
(TCP does this all out of the box.)