vxworks 6.3 active sockets maxs out at 255? - sockets

I have a LPD server running on vxworks 6.3. The client application (over which I have no control) is sending me a LPQ query every tenth of a second. After 235 requests, the client receives a RST when trying to connect. After a time device will again accept some queries (about 300), until it again starts sending out RST.
I have confirmed that it is the TCP stack that is causing the RST. There are some things that I have noticed.
1) I can somewhat change the number of sockets that will accepted if I change the number of other applications that are running. For example, I freed up 4 sockets thereby changing the number accepted from 235 to 239.
2) If I send requests to lpr (port 515) and another port (say, port 80), the total number of connections that are accepted before the RST start happening stays constant at 235.
3) There are lots of sockets sitting TIME_WAIT.
4) I have a mock version of the client. If I slow the client down to one request every quarter second, the server doesn't reject the connections.
5) If I slow down the server's responses, I don't have any connections rejected.
So my theory is that there is some share resource (my top guess is total number of socket handles) that VxWorks can have consumed at a given time. I'm also guessing that this number tops out at 255.
Does anyone know how I can get VxWorks to accept more connections, and leave them in TIME_WAIT when closed? I have looked through the kernel configuration and changed all the values that looked remotely likely, but I have not been able change the number.
We know that we could set SO_LINGER but this not an acceptable solution. However, this does prevent the client connections from getting rejected. We have also tried changed the timeout value for SO_LINGER. This does not appear to be supported in VxWorks. It's either on or off.
Thanks!
Gail

To me it sounds like you are making a new connection for every LPQ query, and after the query is done you aren't closing the connection. In my opinion the correct thing to do is to accept one TCP connection and then use that to get all of the LPQ queries, however that may require mods to the client application. To avoid mods to the client, you should just close the TCP connection after each LPQ query.
Furthermore you can set the max number of FDs open in vxworks by adjusting the #define NUM_FILES config.h (or configall.h or one of those files), but that will just postpone an error if you have a FD leak, which you probably do.

Related

What is the overhead traffic of a TCP connection (plus TCP clarifications)?

We have a TCP connection.
Nothing is sent over; how many traffic(bytes) are needed for each second to keep that connection open?
What is the duration of opening a connection from a client in South America to a server in North Europe?
If I have to send small amount of data (max 256bytes) at x seconds interval, what would be x for which is better to close the connection and reopen again instead of keeping the connection always open?
I do not expect exact data - estimates will suffice.
1) none.
2) some time. Try it and see. For a rough estimate, ping one end from the other and double it.
3) try it. It depends on bandwidth and, more importantly, latency. These vary over wide ranges. Usually, it's better, speed-wise, to keep connections open. 256 bytes at intervals of seconds? I would keep the connection open, especially over paths with possibly high latency, (eg. intercontinental).
1. According to the TCP/IP standard, nothing. However, depending on the network conditions and any middleboxes (NAT devices, firewalls, etc.), a connection with no data going over it may be dropped. That could be a staic timeout (say two minutes, or ten minutes, or an hour), or it could be based on a least-recently-used table in some device.
2. It depends on a lot of factors, and the biggest delay may be from the client's local network rather than the intercontinental connection. However, the surface of the earth between the points is about 40 light-millisenconds, so (without TCP Fast Open) that would be 120 ms for the first data packet to get from the client to the server and 40 ms for the response, 80 ms more than in an active connection.
3. Assuming no broken middleboxes, always better to keep the connection open. However, the delay to recover from a "silently dropped" connection may be a lot longer than the time to open a new one; it might be appropriate for the client to manage its own timeout (on hte order of a second or so), and open a new connection and retry the last message if it hasn't gotten a response by then. Depends on what you're sending; transactional messages might merit such explicit fast retry more than a remote copy of syslog.

Connection refused sockets. Normal behavior?

I have a socket server which accepts multiple connections from various clients. I'm testing it on localhost with a client application which connects - sends data and closes connection 10 times every 10 ms. Some times the test client raises an error: Connection refused by the remote server or something similar.
Is this a normal behavior of the server application ?
10 connects every 10mS is one connection per millisecond, which seems a rather fast rate. Are these connection attempts being made in parallel? If so, perhaps you are filling up the server's listen() backlog-queue; IIRC clients who try to connect while the backlog-queue is full will get a connection-refused error.
To test that hypothesis, try passing in larger or smaller numbers as the second argument to listen() on your server, and see if that makes the connection-refused error occur more or less often.
I'm with Jeremy. You didn't mention the protocol, but I assume it's SOCK_STREAM. It will take longer than 10ms to do the tcp handshake on anything but the most local connection, eventually causing a backlog (and subsequent connection refused error) no matter how high you set your listen backlog to.
You'd be way ahead if you could keep the connection open, and not close it down during each of your computation cycles.

General overhead of creating a TCP connection

I'd like to know the general cost of creating a new connection, compared to UDP. I know TCP requires an initial exchange of packets (the 3 way handshake). What would be other costs? For instance is there some sort of magic in the kernel needed for setting up buffers etc?
The reason I'm asking is I can keep an existing connection open and reuse it as needed. However if there is little overhead reconnecting it would reduce complexity.
Once a UDP packet's been dumped onto the wire, the UDP protocol stack is free to completely forget about it. With TCP, there's at bare minimum the connection details (source/dest port and source/dest IP), the sequence number, the window size for the connection etc... It's not a huge amount of data, but adds up quickly on a busy server with many connections.
And then there's the 3-way handshake as well. Some braindead (and/or malicious systems) can abuse the process (look up 'syn flood'), or just drop the connection on their end, leaving your system waiting for a response or close notice that'll never come. The plus side is that with TCP the system will do its best to make sure the packet gets where it has to. With UDP, there's no guarantees at all.
Compared to the latency of the packet exchange, all other costs such as kernel setup times are insignificant.
OPTION 1: The general cost of creating a TCP connection are:
Create socket connection
Send data
Tear down socket connection
Step 1: Requires an exchange of packets, so it's delayed by to & from network latency plus the destination server's service time. No significant CPU usage on either box is involved.
Step 2: Depends on the size of the message.
Step 3: IIRC, just sends a 'closing now' packet, w/ no wait for destination ack, so no latency involved.
OPTION 2: Costs of UDP:*
Create UDP object
Send data
Close UDP object
Step 1: Requires minimal setup, no latency worries, very fast.
Step 2: BE CAREFUL OF SIZE, there is no retransmit in UDP since it doesn't care if the packet was received by anyone or not. I've heard that the larger the message, the greater probability of data being received corrupted, and that a rule of thumb is that you'll lose a certain percentage of messages over 20 MB.
Step 3: Minimal work, minimal time.
OPTION 3: Use ZeroMQ Instead
You're comparing TCP to UDP with a goal of reducing reconnection time. THERE IS A NICE COMPROMISE: ZeroMQ sockets.
ZMQ allows you to set up a publishing socket where you don't care if anyone is listening (like UDP), and have multiple listeners on that socket. This is NOT a UDP socket - it's an alternative to both of these protocols.
See: ZeroMQ.org for details.
It's very high speed and fault tolerant, and is in increasing use in the financial industry for those reasons.

When is TCP option SO_LINGER (0) required?

I think I understand the formal meaning of the option. In some legacy code I'm handling now, the option is used. The customer complains about RST as response to FIN from its side on connection close from its side.
I am not sure I can remove it safely, since I don't understand when it should be used.
Can you please give an example of when the option would be required?
For my suggestion, please read the last section: “When to use SO_LINGER with timeout 0”.
Before we come to that a little lecture about:
Normal TCP termination
TIME_WAIT
FIN, ACK and RST
Normal TCP termination
The normal TCP termination sequence looks like this (simplified):
We have two peers: A and B
A calls close()
A sends FIN to B
A goes into FIN_WAIT_1 state
B receives FIN
B sends ACK to A
B goes into CLOSE_WAIT state
A receives ACK
A goes into FIN_WAIT_2 state
B calls close()
B sends FIN to A
B goes into LAST_ACK state
A receives FIN
A sends ACK to B
A goes into TIME_WAIT state
B receives ACK
B goes to CLOSED state – i.e. is removed from the socket tables
TIME_WAIT
So the peer that initiates the termination – i.e. calls close() first – will end up in the TIME_WAIT state.
To understand why the TIME_WAIT state is our friend, please read section 2.7 in "UNIX Network Programming" third edition by Stevens et al (page 43).
However, it can be a problem with lots of sockets in TIME_WAIT state on a server as it could eventually prevent new connections from being accepted.
To work around this problem, I have seen many suggesting to set the SO_LINGER socket option with timeout 0 before calling close(). However, this is a bad solution as it causes the TCP connection to be terminated with an error.
Instead, design your application protocol so the connection termination is always initiated from the client side. If the client always knows when it has read all remaining data it can initiate the termination sequence. As an example, a browser knows from the Content-Length HTTP header when it has read all data and can initiate the close. (I know that in HTTP 1.1 it will keep it open for a while for a possible reuse, and then close it.)
If the server needs to close the connection, design the application protocol so the server asks the client to call close().
When to use SO_LINGER with timeout 0
Again, according to "UNIX Network Programming" third edition page 202-203, setting SO_LINGER with timeout 0 prior to calling close() will cause the normal termination sequence not to be initiated.
Instead, the peer setting this option and calling close() will send a RST (connection reset) which indicates an error condition and this is how it will be perceived at the other end. You will typically see errors like "Connection reset by peer".
Therefore, in the normal situation it is a really bad idea to set SO_LINGER with timeout 0 prior to calling close() – from now on called abortive close – in a server application.
However, certain situation warrants doing so anyway:
If a client of your server application misbehaves (times out, returns invalid data, etc.) an abortive close makes sense to avoid being stuck in CLOSE_WAIT or ending up in the TIME_WAIT state.
If you must restart your server application which currently has thousands of client connections you might consider setting this socket option to avoid thousands of server sockets in TIME_WAIT (when calling close() from the server end) as this might prevent the server from getting available ports for new client connections after being restarted.
On page 202 in the aforementioned book it specifically says: "There are certain circumstances which warrant using this feature to send an abortive close. One example is an RS-232 terminal server, which might hang forever in CLOSE_WAIT trying to deliver data to a stuck terminal port, but would properly reset the stuck port if it got an RST to discard the pending data."
I would recommend this long article which I believe gives a very good answer to your question.
The typical reason to set a SO_LINGER timeout of zero is to avoid large numbers of connections sitting in the TIME_WAIT state, tying up all the available resources on a server.
When a TCP connection is closed cleanly, the end that initiated the close ("active close") ends up with the connection sitting in TIME_WAIT for several minutes. So if your protocol is one where the server initiates the connection close, and involves very large numbers of short-lived connections, then it might be susceptible to this problem.
This isn't a good idea, though - TIME_WAIT exists for a reason (to ensure that stray packets from old connections don't interfere with new connections). It's a better idea to redesign your protocol to one where the client initiates the connection close, if possible.
When linger is on but the timeout is zero the TCP stack doesn't wait for pending data to be sent before closing the connection. Data could be lost due to this but by setting linger this way you're accepting this and asking that the connection be reset straight away rather than closed gracefully. This causes an RST to be sent rather than the usual FIN.
Thanks to EJP for his comment, see here for details.
Whether you can remove the linger in your code safely or not depends on the type of your application: is it a „client“ (opening TCP connections and actively closing it first) or is it a „server“ (listening to a TCP open and closing it after the other side initiated the close)?
If your application has the flavor of a „client“ (closing first) AND you initiate & close a huge number of connections to different servers (e.g. when your app is a monitoring app supervising the reachability of a huge number of different servers) your app has the problem that all your client connections are stuck in TIME_WAIT state. Then, I would recommend to shorten the timeout to a smaller value than the default to still shutdown gracefully but free up the client connections resources earlier. I would not set the timeout to 0, as 0 does not shutdown gracefully with FIN but abortive with RST.
If your application has the flavor of a „client“ and has to fetch a huge amount of small files from the same server, you should not initiate a new TCP connection per file and end up in a huge amount of client connections in TIME_WAIT, but keep the connection open and fetch all data over the same connection. Linger option can and should be removed.
If your application is a „server“ (close second as reaction to peer‘s close), on close() your connection is shutdown gracefully and resources are freed up as you don‘t enter TIME_WAIT state. Linger should not be used. But if your sever app has a supervisory process detecting inactive open connections idleing for a long time („long“ is to be defined) you can shutdown this inactive connection from your side - see it as kind of error handling - with an abortive shutdown. This is done by setting linger timeout to 0. close() will then send a RST to the client, telling him that you are angry :-)
I just saw that in the websockets RFC (RFC 6455), it explicitly states that the server should call close() on the TCP socket first(!)
I was in awe, as I hold the answer/posts by #mgd in this thread as de facto, and the RFC clearly goes against that. But, perhaps this would be a case where setting a linger time of 0 would be acceptable.
The underlying TCP connection, in most normal cases, SHOULD be closed
first by the server, so that it holds the TIME_WAIT state and not the
client
I'm very interested to hear any thoughts/insight on this.
In servers, you may like to send RST instead of FIN when disconnecting misbehaving clients. That skips FIN-WAIT followed by TIME-WAIT socket states in the server, which prevents from depleting server resources, and, hence, protects from this kind of denial-of-service attack.
I like Maxim's observation that DOS attacks can exhaust server resources. It also happens without an actually malicious adversary.
Some servers have to deal with the 'unintentional DOS attack' which occurs when the client app has a bug with connection leak, where they keep creating a new connection for every new command they send to your server. And then perhaps eventually closing their connections if they hit GC pressure, or perhaps the connections eventually time out.
Another scenario is when 'all clients have the same TCP address' scenario. Then client connections are distinguishable only by port numbers (if they connect to a single server). And if clients start rapidly cycling opening/closing connections for any reason they can exhaust the (client addr+port, server IP+port) tuple-space.
So I think servers may be best advised to switch to the Linger-Zero strategy when they see a high number of sockets in the TIME_WAIT state - although it doesn't fix the client behavior, it might reduce the impact.
The listen socket on a server can use linger with time 0 to have access to binding back to the socket immediately and to reset any clients whose connections are not yet finished connecting. TIME_WAIT is something that is only interesting when you have a multi-path network and can end up with miss-ordered packets or otherwise are dealing with odd network packet ordering/arrival-timing.

Sockets Asyn Connection

I am new to Async Socket Connection. Can you please explain. How does this technology work.
There's an existing application (server) which requires socket connections to transmit data back and forward. I already create my application (.NET) but the Server application doesn't seem to understand the XML data that I am sending. My documentation is giving me two ports one to Send and another one to Receive.
I need to be sure that I understand how this works.
I got the IP addresses and also the two Ports to be used.
A socket is the most "raw" way you can use to send byte-level TCP and UDP packets across a network.
For example, your browser uses a socket TCP connection to connect to the StackOverflow web server on port 80. Your browser and the server exchange commands and data according to an agreed-on structure/protocol (in this case, HTTP). An asynchronous socket is no different than a synchronous socket except that is does not block the thread that's using it.
This is really not the most ideal way to work (check and see if your server/vendor application supports SOAP/Web Services, etc), but if this is really the only way, there could be a number of reasons why it's failing. To name a few...
Not actually getting connected or sending data. Run a test using WinsockTool (http://www.isatools.org/tools/winsocktool.msi) and simulate your client first to make sure the server is working as expected.
Encoding incorrect - You're sending raw bytes across the network... Make sure you're using the correct encoding to convert your XML into bytes (ASCII, UTF8, etc).
Buffer Length - Your sending buffer (the amount of data you can transmit in one shot) may be too small or the server may expect a content of a certain length, and your XML could be getting truncated.
let's break a misconception... sockets are FULL-DUPLEX: you connect to a server using one port, then you can send AND receive data through the same socket, no need for 2 port numbers. (actually, there is a port assigned for receiving data, but it is: 1. assigned automatically when creating the socket (unless told so) and 2. of no use in the function calls to receive data)
so you tell us that your documentation give you 2 port numbers... i assume that the "server" is an already existing in-house application, and you are trying to talk to it. if the doc lists 2 ports, then you will need 2 sockets: one for sending and another one for receiving. now i would suggest you first use a synchronous socket before trying the async way: a synchronous socket is less error-prone for a first test.
(by the way, let's break another misconception: if well coded, once a server listen on a port, it can receive any number of connection through the same port number, no need to open 2 listening ports to accept 2 connections... sorry for the re-alignment, but i've seen those 2 errors committed enough time, it gives me a urge to kill)