I apologize before hand if some of these questions might be obvious for expert network programmers. I have researched and read about coding in networking and it is still not clear to me how to do this.
Assume that I want to write a tcp proxy (in go) with the connection between some TCP client and some TCP server. Something like this:
First assume that these connection are semi-permanent (will be closed after a long long while) and I need the data to arrive in order.
The idea that I want to implement is the following: whenever I get a request from the client, I want to forward that request to the backend server and wait (and do nothing) until the backend server responds to me (the proxy) and then forward that response to the client (assume that both TCP connection will be maintained in the common case).
There is one main problem that I am not sure how to solve. When I forward the request from the proxy to the server, and get the response, how do I know when the server has sent me all the information that I need if I do not know beforehand the format of the data being sent from the server to the proxy (i.e. I don't know if the response from the server is of the form of type-length-value scheme nor do I know if `\r\n\ indicates the end of the message form the server). I was told that I should assume that I get all the data from the server connection whenever my read size from the tcp connection is zero or smaller than the read size that I expected. However, this does not seem correct to me. The reason it might not be correct in general is the following:
Assume that the server for some reason is only writing to its socket one byte at a time but the total length of the response to the "real" client is much much much longer. Therefore, isn't it possible that when the proxy reads the tcp socket connected to the server, that the proxy only reads one byte and if it loops fast enough (to do a read before it receives more data), then read zero and incorrectly concludes that It got all the message that the client intended to receive?
One way to fix this might be to wait after each read from the socket, so that the proxy doesn't loop faster than it gets bytes. The reason that I am worried is, assume there is a network partition and i can't talk to the server anymore. However, it is not disconnected from me long enough to timeout the TCP connection. Thus, isn't it possible that I try to read from the tcp socket to the server again (faster than I get data) and read zero and incorrectly conclude that its all the data and then send it pack to the client? (remember, the promise I want to keep is that I only send whole messages to the client when i write to the client connection. Thus, its illegal to consider correct behaviour if the proxy goes, reads the connection again at a later time after it already wrote to the client, and sends the missing chunk at a later time, maybe during the response of a different request).
The code that I have written is in go-playground.
The analogy that I like to use to explain why I think this method doesn't work is the following:
Say we have a cup and the proxy is drinking half the cup every time it does a read from the server, but the server only puts 1 teaspoon at a time. Thus, if the proxy drinks faster than it gets teaspoons it might reach zero too soon and conclude that its socket is empty and that its ok to move on! Which is wrong if we want to guarantee we are sending full messages every time. Either, this analogy is wrong and some "magic" from TCP makes it work or the algorithm that assumes until the socket is empty is just plain wrong.
A question that deals with a similar problems here suggests to read until EOF. However, I am unsure why that would be correct. Does reading EOF mean that I got the indented message? Is an EOF sent each time someone writes a chunk of bytes to a tcp socket (i.e. I am worried that if the server writes one byte at a time, that it sends 1 EOF per bytes)? However, EOF might be some of the "magic" of how a TCP connection really works? Does sending EOF's close the connection? If it does its not a method that I want to use. Also, I have no control of what the server might be doing (i.e. I do not know how often it wants to write to the socket to send data to the proxy, however, its reasonable to assume it writes to the socket with some "standard/normal writing algorithm to sockets"). I am just not convinced that reading till EOF from the socket from server is correct. Why would it? When can I even read to EOF? Are EOFs part of the data or are they in the TCP header?
Also, the idea that I wrote about putting a wait just epsilon bellow the time-out, would that work in the worst-case or only on average? I was also thinking, I realized that if the Wait() call is longer than the time-out, then if you return to the tcp connection and it doesn't have anything, then its safe to move on. However, if it doesn't have anything and we don't know what happened to the server, then we would time out. So its safe to close the connection (because the timeout would have done that anyway). Thus, I think if the Wait call is at least as long as the timeout, this procedure does work! What do people think?
I am also interested in an answer that can justify maybe why this algorithm work on some cases. For example, I was thinking, even if the server only write a byte at a time, if the scenario of deployment is a tight data centre, then on average, because delays are really small and the wait call is almost certainly enough, then wouldn't this algorithm be fine?
Also, are there any risks of the code I wrote getting into a "deadlock"?
package main
import (
"fmt"
"net"
)
type Proxy struct {
ServerConnection *net.TCPConn
ClientConnection *net.TCPConn
}
func (p *Proxy) Proxy() {
fmt.Println("Running proxy...")
for {
request := p.receiveRequestClient()
p.sendClientRequestToServer(request)
response := p.receiveResponseFromServer() //<--worried about this one.
p.sendServerResponseToClient(response)
}
}
func (p *Proxy) receiveRequestClient() (request []byte) {
//assume this function is a black box and that it works.
//maybe we know that the messages from the client always end in \r\n or they
//they are length prefixed.
return
}
func (p *Proxy) sendClientRequestToServer(request []byte) {
//do
bytesSent := 0
bytesToSend := len(request)
for bytesSent < bytesToSend {
n, _ := p.ServerConnection.Write(request)
bytesSent += n
}
return
}
// Intended behaviour: waits until ALL of the response from backend server is obtained.
// What it does though, assumes that if it reads zero, that the server has not yet
// written to the proxy and therefore waits. However, once the first byte has been read,
// keeps writting until it extracts all the data from the server and the socket is "empty".
// (Signaled by reading zero from the second loop)
func (p *Proxy) receiveResponseFromServer() (response []byte) {
bytesRead, _ := p.ServerConnection.Read(response)
for bytesRead == 0 {
bytesRead, _ = p.ServerConnection.Read(response)
}
for bytesRead != 0 {
n, _ := p.ServerConnection.Read(response)
bytesRead += n
//Wait(n) could solve it here?
}
return
}
func (p *Proxy) sendServerResponseToClient(response []byte) {
bytesSent := 0
bytesToSend := len(request)
for bytesSent < bytesToSend {
n, _ := p.ServerConnection.Write(request)
bytesSent += n
}
return
}
func main() {
proxy := &Proxy{}
proxy.Proxy()
}
Unless you're working with a specific higher-level protocol, there is no "message" to read from the client to relay to the server. TCP is a stream protocol, and all you can do is shuttle bytes back and forth.
The good news is that this is amazingly easy in go, and the core part of this proxy will be:
go io.Copy(server, client)
io.Copy(client, server)
This is obviously missing error handling, and doesn't shut down cleanly, but clearly shows how the core data transfer is handled.
Related
I am creating a UDP-proxy in go, but while doing some load test using iperf, I start to get this error:
socket: too many open files
After searching and testing, I found that if I create a pool using a map of opening connections being the key *net.UDPAddr.String() and the value an instance of UDP-proxy containing an *net.UDPConn, I am available to reuse existing connection in case the client address is the same:
var clients map[string]*UDPProxy.UDPProxy = make(map[string]*UDPProxy.UDPProxy)
This block of code looks something like:
// wait for connections
for {
n, clientAddr, err := conn.ReadFromUDP(buffer)
if err != nil {
log.Println(err)
}
counter++
if *d {
log.Printf("new connection from %s", clientAddr.String())
}
fmt.Printf("Connections: %d, clients: %d\n", counter, len(clients))
proxy, found = clients[clientAddr.String()]
if !found {
// make new connection to remote server
proxy = UDPProxy.New(conn, clientAddr, raddr_udp, *d)
clients[clientAddr.String()] = proxy
}
go proxy.Start(buffer[0:n])
}
This seems to be working, but the problem I have now, is that I need find a way of expiring,cleaning the map when the client exists or is not using any more the proxy so that I could avoid having multiple unused connections.
Any idea how of could I improve this or even better, how could I replace totally the map, I don't know if channels could be help full?
Thanks in advance.
Since you are creating UDP proxies, you probably know that you have to come up with your own solution for deciding when to "terminate" the proxy session. The session is just an abstraction when it comes to UDP - unless the UDPProxy package you're using has an established mechanism already.
Depending on why you are creating UDP proxies, it might be easy to arbitrarily cleanup connections ...
So if you know that a client is exiting, call the Close() method on the proxy (assuming there is one) and use delete on the map entry.
How to decide that a client is exiting is up to you. Could use a slice as a FIFO, or pick one randomly, or try setting timers for each.
I've been working in a socket tcp connection to a game server. The big problem here is that the game server send the data without any separators - since it sends the packet lenght inside the data -, making impossible to use socket:receive("*a") or "*l". The data received from the server does not have a static size and are sent in HEX format. I'm using this solution:
while true do
local rect, r, st = socket.select({_S.sockets.main, _S.sockets.bulle}, nil, 0.2)
for i, con in ipairs(rect) do
resp, err, part = con:receive(1)
if resp ~= nil then
dataRecv = dataRecv..resp
end
end
end
As you can see, I can only get all the data from the socket by reading one byte and appending it to a string, not a good way since I have two sockets to read. Is there a better way to receive data from this socket?
I don't think there is any other option; usually in a situation like this the client reads a packet of specific length to figure out how much it needs to read from the rest of the stream. Some protocols combine new line and the length; for example HTTP uses line separators for headers, with one of the headers specifying the length of the content that follows the headers.
Still, you don't need to read the stream one-by-one character as you can switch to non-blocking read and request any number of characters. If there is not enough to read, you'll get partially read content plus "timeout" signaled, which you can handle in your logic; from the documentation:
In case of error, the method returns nil followed by an error message
which can be the string 'closed' in case the connection was closed
before the transmission was completed or the string 'timeout' in case
there was a timeout during the operation. Also, after the error
message, the function returns the partial result of the transmission.
We are writing a message broker in Haskell (HMB). Therefore messages have to be parsed (Data.Binary) after they are received from socket (Network.Socket). We've been testing on loopback (localhost) so far - for producing and parsing messages. This worked quiet well. If we benchmark by producing messages from another machine we are facing problems: Suddenly the parser does not have enough bytes to parse.
The first 4 bytes of each message defines the length of the message and thus describes the message to be parsed. As hinted above, we do parsing with Data.Binary - so this is lazy. For testing purposes we switched parsing of the first 4 bytes to strict by using the cereal library. This the same problem. We now even tried to completely parse the requests with cereal only and the problem also remains.
In the code you'll see that we do threading. However, we also tried without a channel (single threaded setup) but this didn't solve the problem either.
Here is a part of the code (Thread1) where the received bytes are written to a channel to be further consumed/parsed. (As mentioned, nothing changes if we omit channeling and directly parse input):
runConnection :: (Socket, SockAddr) -> RequestChan -> Bool -> IO()
runConnection conn chan False = return ()
runConnection conn chan True = do
r <- recvFromSock conn
case (r) of
Left e -> do
handleSocketError conn e
runConnection conn chan False
Right input -> do
threadDelay 5000 -- THIS FIXES THE PROBLEM!?
writeToReqChan conn chan input
runConnection conn chan True
Here is the part (Thread2) where input is beeing parsed:
runApiHandler :: RequestChan -> ResponseChan -> IO()
runApiHandler rqChan rsChan = do
(conn, req) <- readChan rqChan
case readRequest req of -- readRequest IS THE PARSER
Left (bs, bo, e) -> handleHandlerError conn $ ParseRequestError e
Right (bs, bo, rm) -> do
res <- handleRequest rm
case res of
Left e -> handleHandlerError conn e
Right bs -> writeToResChan conn rsChan bs
runApiHandler rqChan rsChan
Now I figured out, that if the process of parsing is delayed a bit (see threadDelay in the first code block), everything works fine. Which basically means, the parser doesn't wait for bytes received from the socket.
Why is that? Why does the parser not wait for the socket the have enough bytes? Is there a general mistake in our setup?
I would bet that the problem has nothing to do with the parser but is instead due to the blocking semantics of UNIX sockets.
While a loopback
interface will likely pass the packet directly from the sender to the receiver,
an Ethernet interface may need to break up the packet to fit in the Maximum
Transmission Unit (MTU) of the link. This is known as packet fragmentation.
The len
argument to the recv system call is merely
the upper bound on the received length (e.g. the size of the target buffer); the
call may produce less data than you ask for. To quote the manpage,
If no messages are available at the socket, the receive calls wait for a
message to arrive, unless the socket is nonblocking (see fcntl(2)), in which
case the value -1 is returned and the external variable errno is set to
EAGAIN or EWOULDBLOCK. The receive calls normally return any data
available, up to the requested amount, rather than waiting for receipt of
the full amount requested.
For this reason, you may need multiple recv calls to retrieve the entire packet. Your example works if you delay the recv as the operating system can reassemble the original packet since all fragments have arrived by the time it is requested.
As meiersi pointed out, there are a variety of streaming I/O libraries that have developed in the Haskell world for solving this problem, among others. These include pipes, conduit, io-streams, and others. Depending upon your goals, this may be a natural way to handle this issue.
You might want to try the socket support in conduit-extra combined with binary-conduit to properly handle the parsing of the chunked streaming, which happens due to the reasons pointed out by bgamari.
First of all, consider yourself lucky to observe this. On many platforms perhaps only one out of a thousand packets exhibit this behaviour, causing a lot of such (sorry) bad networking code to fail seldom and randomly.
The problem is that you start processing before the data is ready. Instead of the threadDelay (which introduces a permanent delay and might not be long enough in all cases), the solution is to make sure you have at least one item/message/packet to process before you start processing it. Your protocol where the first 32bit word contains the length is perfect for this. Read data until you have at least 4 bytes (the length). Then read data until you have the required number of bytes. If any calls to recvFromSock returns less than the required number, call it again to get some more. Remember to also handle the case of 0 bytes, this means the other party closed the connection.
I have implemented this for a similar protocol (SMPP, packets also starts with the length) and it works perfectly.
I am working with client-server programming I am referring this link and my server is successfully running.
I need to send data continuously to the server.
I don't want to connect() every time before sending each packet. So for first time I just created a socket and send the first packet, the rest of the data I just used write() function to write data to the socket.
But my problem is while sending data continuously if the server is not there or my Ethernet is disabled still it successfully write data to socket.
Is there any method by which I can create socket only at once and send data continuously with knowing server failure?.
The main reason for doing like this that, on the server side I am using GPRS modem and on each time when call connect() function for each packet the modem get hanged.
For creating socket I using below code
Gprs_sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (Gprs_sockfd < 0)
{
Display("ERROR opening socket");
return 0;
}
server = gethostbyname((const char*)ip_address);
if (server == NULL)
{
Display("ERROR, no such host");
return 0;
}
bzero((char *) &serv_addr, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
bcopy((char *)server->h_addr,(char *)&serv_addr.sin_addr.s_addr,server->h_length);
serv_addr.sin_port = htons(portno);
if (connect(Gprs_sockfd,(struct sockaddr *) &serv_addr,sizeof(serv_addr)) < 0)
{
Display("ERROR connecting");
return 0;
}
And each time I writing to the socket using the below code
n = write(Gprs_sockfd,data,length);
if(n<0)
{
Display("ERROR writing to socket");
return 0;
}
Thanks in advance.............
TCP was designed to tolerate temporary failures. It does byte sequencing, acknowledgments, and, if necessary, retransmissions. All unacknowledged data is buffered inside the kernel network stack. If I remember correctly the default is three re-transmission attempts (somebody correct me if I'm wrong) with exponential back-off timeouts. That quickly adds up to dozens of seconds, if not minutes.
My suggestion would be to design application-level acknowledgments into your protocol, meaning the server would send a short reply saying that it received that much data up to now, say every second. If the client does not receive suck ack in say 3 seconds, the client knows the connection is unusable and can close it. By the way, this is easier done with non-blocking sockets and polling functions like select(2) or poll(2).
Edit 0:
I think this would be very relevant here - "The ultimate SO_LINGER page, or: why is my tcp not reliable".
Nikolai is correct here, the behaviour you experience here is desirable as basically you could continue transfering data after network outage without any logic in your application. If your application should detect outages longer that specified amount of time, you need to add heartbeating into your protocol. This is standard way of solving the problem. It can also allow you for detect situation when network is all right, receiver is alive, but it has deadlocked (due to to a software bug).
Heartbeating could be as simple as mentioned by Nikolai -- sending a small packet every X seconds; if the server can't see the packet for N*X seconds, the connection would be dropped.
I starts learning TCP protocol from internet and having some experiments. After I read an article from http://www.diffen.com/difference/TCP_vs_UDP
"TCP is more reliable since it manages message acknowledgment and retransmissions in case of lost parts. Thus there is absolutely no missing data."
Then I do my experiment, I write a block of code with TCP socket:
while( ! EOF (file))
{
data = read_from(file, 5KB); //read 5KB from file
write(data, socket); //write data to socket to send
}
I think it's good because "TCP is reliable" and it "retransmissions lost parts"... But it's not good at all. A small file is OK but when it comes to about 2MB, sometimes it's OK but not always...
Now, I try another one:
while( ! EOF (file))
{
wait_for_ACK();//or sleep 5 seconds
data = read_from(file, 5KB); //read 5KB from file
write(data, socket); //write data to socket to send
}
It's good now...
All I can think of is that the 1st one fails because of:
1. buffer overflow on sender because the sending rate is slower than the writing rate of the program (the sending rate is controlled by TCP)
2. Maybe the sending rate is greater than writing rate but some packets are lost (after some retransmission, still fails and then TCP gives up...)
Any ideas?
Thanks.
TCP will ensure that you don't lose data but you should check how many bytes actually got accepted for transmission... the typical loop is
while (size > 0)
{
int sz = send(socket, bufptr, size, 0);
if (sz == -1) ... whoops, error ...
size -= sz; bufptr += sz;
}
when the send call accepts some data from your program then it's a job of the OS to get that to destination (including retransmission), but the buffer for sending may be smaller than the size you need to send, and that's why the resulting sz (number of bytes accepted for transmission) may be less than size.
It's also important to consider that sending is asynchronous, i.e. after the send function returns the data is not already at the destination, it's has been only assigned to the TCP transport system to be delivered. If you want to know when it will be received then you'll have to use other systems (e.g. a reply message from your counterpart).
You have to check write(socket) to make sure it writes what you ask.
Loop until you've sent everything or you've calculated a time out.
Do not use indefinite timeouts on socket read/write. You're asking for trouble if you do, especially on Windows.