Why is RIOSend slower than classic winapi socket send? - sockets

I am testing two apps with TCP sockets. First app uses new RIO socket calls, the second app uses classic Winsock socket calls. Both apps send a single string to a server. I made them both use non-blocking sockets and disabled the Nagle algorithm. When I measure the time it takes to execute the sends, contrary to my expectations the classic socket is faster. What is the reason for this?
Here is my code:
/*
Classic socket
*/
SOCKET ConnectSocket = INVALID_SOCKET;
// ... socket initialization
// Make socket non-blocking and disable Nagle's algorithm
unsigned long mode = 1;
ioctlsocket(ConnectSocket, FIONBIO, &mode);
int isOn = 1;
setsockopt(ConnectSocket, IPPROTO_TCP, TCP_NODELAY, (char *) &isOn, sizeof(int) );
// Measure exec time
t1 = __rdtscp(&dum);
iResult = send(ConnectSocket, sendbuf, (int)strlen(sendbuf), 0);
t2 = __rdtscp(&dum);
cout << (t2 - t1) / 3000.0f << " 10^-6 sec" << endl;
/*
RIO socket
*/
SOCKET s = WSASocket(
AF_INET,
SOCK_STREAM,
IPPROTO_TCP,
NULL,
0,
WSA_FLAG_REGISTERED_IO);
// ... socket initialization
// Make socket non-blocking and disable Nagel's algorithm
unsigned long mode = 1;
ioctlsocket(s, FIONBIO, &mode);
int isOn = 1;
setsockopt(s, IPPROTO_TCP, TCP_NODELAY, (char *)&isOn, sizeof(int));
// RIO socket is event-based
hEvent = WSACreateEvent();
RIO_NOTIFICATION_COMPLETION type;
type.Type = RIO_EVENT_COMPLETION;
type.Event.EventHandle = hEvent;
type.Event.NotifyReset = TRUE;
RIO_CQ complQueue = rioFuncTable.RIOCreateCompletionQueue(1000, &type);
RIO_RQ requestQueue = rioFuncTable.RIOCreateRequestQueue(
s,
10,
1,
10,
1,
complQueue,
complQueue,
NULL);
// Init rio buffers
char *pSndBuffer = new char[sndBufferSize + 1]();
std::copy(message, message + sndBufferSize, pSndBuffer); // put message to a server here
RIO_BUFFERID sndBufID = rioFuncTable.RIORegisterBuffer(pSndBuffer, sndBufferSize);
RIO_BUF sndBuffer;
sndBuffer.BufferId = sndBufID;
sndBuffer.Offset = 0;
sndBuffer.Length = sndBufferSize;
// Measure exec time
t1 = __rdtscp(&dum);
rioFuncTable.RIOSend(requestQueue, &sndBuffer, 1, 0, NULL);
t2 = __rdtscp(&dum);
cout << (t2 - t1) / 3000.0f << " 10^-6 sec" << endl;

RioSend() sends for real. send() memcopies the buffer to socket's tx buffer.

Related

What buffer collects the data sent through TCP sockets on localhost?

I have a client and server connected through TCP sockets on localhost.
I check with getsockopt that the server's SO_SNDBUF is small and the client's SO_RCVBUF is small (in my case both are 64KB)
I send twenty 500KB buffers from the server to the client, but in the client I've added a sleep for 500ms after each recv and I've capped the client receive buffer to 1MB.
What I observe is that the server very quickly rids itself of the 10MB of data which then arrives at the client in the next several seconds. 7-8MB are consistently in the "ether" in my experiments.
My question is what is this "ether"? It's obviously some buffer somewhere but can one tell which buffer it is?
Here is my test program.
#include <sys/socket.h>
#include <arpa/inet.h>
#include <unistd.h>
#include <thread>
#include <cstdio>
#include <vector>
#include <cstdlib>
#define PROXY 0
static std::vector<uint8_t> getRandomBuf() {
std::vector<uint8_t> buf;
buf.reserve(500 * 1024);
for (size_t i = 0; i < buf.capacity(); ++i) buf.push_back(rand() % 256);
return buf;
}
int server() {
auto sd = socket(AF_INET, SOCK_STREAM, 0);
if (sd < 0) return puts("socket fail");
sockaddr_in srv = {};
srv.sin_family = AF_INET;
srv.sin_addr.s_addr = INADDR_ANY;
srv.sin_port = htons(7654);
int enable = 1;
if (setsockopt(sd, SOL_SOCKET, SO_REUSEADDR, &enable, sizeof(int)) < 0) {
return puts("setsockopt fail");
}
if (bind(sd, (sockaddr*)&srv, sizeof(srv)) < 0) {
return puts("bind fail");
}
listen(sd, 3);
puts("listening...");
sockaddr_in client;
socklen_t csz = sizeof(client);
auto sock = accept(sd, (sockaddr*)&client, &csz);
if (sock < 0) return puts("accept fail");
{
int data;
socklen_t size = sizeof(data);
getsockopt(sock, SOL_SOCKET, SO_SNDBUF, &data, &size);
printf("accepted: %d\n", int(data));
}
for (int i=0; i<20; ++i) {
auto buf = getRandomBuf();
puts("Server sending blob");
send(sock, buf.data(), buf.size(), 0);
puts(" Server completed send of blob");
}
while (true) std::this_thread::yield();
return close(sock);
}
int client() {
int sd = socket(AF_INET, SOCK_STREAM, 0);
if (sd < 0) return puts("socket fail");
sockaddr_in client = {};
client.sin_family = AF_INET;
client.sin_addr.s_addr = inet_addr("127.0.0.1");
#if PROXY
client.sin_port = htons(9654);
#else
client.sin_port = htons(7654);
#endif
if (connect(sd, (sockaddr*)&client, sizeof(client)) < 0) {
return puts("connect fail");
}
{
int data;
socklen_t size = sizeof(data);
getsockopt(sd, SOL_SOCKET, SO_RCVBUF, &data, &size);
printf("connected: %d\n", int(data));
}
std::vector<uint8_t> buf(1024*1024);
while (true) {
auto s = recv(sd, buf.data(), buf.size(), 0);
if (s <= 0) {
puts("recv fail");
break;
}
printf("Client received %.1f KB\n", double(s)/1024);
#if !PROXY
std::this_thread::sleep_for(std::chrono::milliseconds(500));
#endif
}
return close(sd);
}
int main() {
std::thread srv(server);
std::this_thread::sleep_for(std::chrono::milliseconds(300)); // give time for the server to start
client();
srv.join();
return 0;
}
Note that in the test program there is a #define PROXY 0.
In another experiment with PROXY set to 1, I ditch the sleep and instead connect the client to a throttling proxy (Charles) and throttle the bandwidth to 400KB/s. In this case the server rids itself of the 10MB almost immediately and they arrive in course of ~20 seconds on the client. I assume that the proxy is buffering, though I don't see a configuration in this particular one for the buffer size.
This is all done hunting for another (likely bufferbloat) issue in which the server sends 10MB with 20 packets from Denver to Amsterdam over an Internet connection which does indeed have a 400KB/s bandwidth. In this case the server, much like the throttling proxy example from above, rids itself of the 10MB almost immediately, and they arrive over the next 20 seconds on the client, leading to 20 second delays for any subsequent messages. Had they not left the server, I would've been able to reorder the packets and send higher-priority ones in-between the ones from the 10MB blob, and not have the client suffer a 20 second delay due to network clog.

Questions about select()

Considering the following class method:
void TCPServer::listenWithTimeout() {
fd_set descrSet;
struct timeval timeout = {1, 0};
while (listeningThread.active) {
std::cout << "TCP Server thread is listening..." << std::endl;
FD_ZERO(&descrSet);
FD_SET(mySocketFD, &descrSet);
int descrAmount = select(mySocketFD + 1, &descrSet, NULL, NULL, &timeout);
if (descrAmount > 0) {
assert(FD_ISSET(mySocketFD, &descrSet));
int threadIndx = findAvailableThread();
if (threadIndx < 0) {
std::cout << "Connection not accepted: no threads available..." << std::endl;
} else {
joinThread(threadIndx);
int newSocketFD = accept(mySocketFD, (struct sockaddr *) &mySocket, &clieAddrssLen);
std::cout << "TCP Client connected..." << std::endl;
connectionThreads[threadIndx].thread = std::thread(TCPServer::startTCPConnectionWithException, std::ref(*this), threadIndx, newSocketFD);
connectionThreads[threadIndx].active = true;
}
}
}
std::cout << "TCP Server thread is terminating..." << std::endl;
}
Here are some question:
when there are not available threads (findAvailableThreads() returns -1), is it a normal behaviour that select() doesn't wait its timeout and so the while loop iterates really fast until a new thread is available?
if yes, how could I avoid these really fast iterations? Instead of using something like a simple sleep() at line 13 inside the if branch, is there a way to let select() restore its timeout? Or even, is there a way to completely reject the incoming connection pending?
when there are not available threads (findAvailableThreads() returns -1), is it a normal behaviour that select() doesn't wait its timeout and so the while loop iterates really fast until a new thread is available?
Yes, because under that condition, you are not calling accept(), so you are not changing the listening socket's state. It will remain in a readable state as long as it has a client connection waiting to be accept()'ed.
if yes, how could I avoid these really fast iterations?
Call accept() before checking for an available thread. If no thread is available, close the accepted connection.
Instead of using something like a simple sleep() at line 13, inside the if branch, is there a way to let select() restore its timeout?
The only way is to accept() the connection that put the listening socket into a readable state, so it has a chance to go back to a non-readable state. The timeout will not apply again until the socket is no longer in a readable state.
Or even, is there a way to completely reject the incoming connection pending?
The only way is to accept() it first, then you can close() it if needed.
Try this:
void TCPServer::listenWithTimeout() {
fd_set descrSet;
while (listeningThread.active) {
std::cout << "TCP Server thread is listening..." << std::endl;
FD_ZERO(&descrSet);
FD_SET(mySocketFD, &descrSet);
struct timeval timeout = {1, 0};
int descrAmount = select(mySocketFD + 1, &descrSet, NULL, NULL, &timeout);
if (descrAmount > 0) {
assert(FD_ISSET(mySocketFD, &descrSet));
int newSocketFD = accept(mySocketFD, (struct sockaddr *) &mySocket, &clieAddrssLen);
if (newSocketFD != -1) {
int threadIndx = findAvailableThread();
if (threadIndx < 0) {
close(newSocketFD);
std::cout << "Connection not accepted: no threads available..." << std::endl;
} else {
joinThread(threadIndx);
std::cout << "TCP Client connected..." << std::endl;
connectionThreads[threadIndx].thread = std::thread(TCPServer::startTCPConnectionWithException, std::ref(*this), threadIndx, newSocketFD);
connectionThreads[threadIndx].active = true;
}
}
}
}
std::cout << "TCP Server thread is terminating..." << std::endl;
}

SCTP: What should be the sctp_status.sstate value of an SCTP socket after succesful connect() call?

I'm trying to connect to a remote peer (which I don't have directory access other than connecting to it via socket and ping) via SCTP. Assuming that I have connected succesfully, what should be the value of my sctp_status.sstate if I try calling getsocktopt()? Mine is SCTP_COOKIE_ECHOED(3) according to sctp.h. Is it correct? Shouldn't it be SCTP_ESTABLISHED?
Because I tried sending message to the remote peer with this code:
ret = sctp_sendmsg (connSock, (void *) data, (size_t) strlen (data), (struct sockaddr *) &servaddr, sizeof (servaddr), 46, 0, 0, 0, 0);
It returned the number of bytes I tried sending. Then when I tried catching if there's any response:
ret = sctp_recvmsg (connSock, (void *) reply, sizeof (reply), NULL,
NULL, NULL, &flags);
It returns -1 with errno of ECONNRESET(104). What are the possible mistakes in my code, or maybe in my flow? Did I miss something?
Thanks in advance for answering. Will gladly appreciate that. :)
Update: Here down below is my client code in connecting to the remote peer. It's actually a node addon for me to use since SCTP is not fully supported in node. Using lksctp-tools package to include the headers.
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/socket.h>
#include <sys/types.h>
#include <netinet/in.h>
#include <netinet/sctp.h>
#include <arpa/inet.h>
#include <signal.h>
#define MAX_BUFFER 1024
int connSock = 0;
int connect(char host[], int port, char remote_host[], int remote_port, int timeout) {
int ret, flags;
fd_set rset, wset;
struct sockaddr_in servaddr;
struct sockaddr_in locaddr;
struct sctp_initmsg initmsg;
struct timeval tval;
struct sctp_status status;
socklen_t opt_len;
errno = 0;
connSock = socket (AF_INET, SOCK_STREAM, IPPROTO_SCTP);
flags = fcntl(connSock, F_GETFL, 0);
fcntl(connSock, F_SETFL, flags | O_NONBLOCK);
if (connSock == -1)
{
return (-1);
}
memset(&locaddr, 0, sizeof(locaddr));
locaddr.sin_family = AF_INET;
locaddr.sin_port = htons(port);
locaddr.sin_addr.s_addr = inet_addr(host);
ret = bind(connSock, (struct sockaddr *)&locaddr, sizeof(locaddr));
if (ret == -1)
{
return (-1);
}
memset (&initmsg, 0, sizeof (initmsg));
initmsg.sinit_num_ostreams = 5;
initmsg.sinit_max_instreams = 5;
initmsg.sinit_max_attempts = 10;
ret = setsockopt(connSock, IPPROTO_SCTP, SCTP_INITMSG, &initmsg, sizeof(initmsg));
if (ret == -1)
{
return (-1);
}
memset (&servaddr, 0, sizeof (servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_port = htons (remote_port);
servaddr.sin_addr.s_addr = inet_addr (remote_host);
if((ret = connect (connSock, (struct sockaddr *) &servaddr, sizeof (servaddr))) < 0)
if (errno != EINPROGRESS)
return (-1);
if (ret == 0) {
fcntl(connSock, F_SETFL, flags);
return 0;
}
FD_ZERO(&rset);
FD_SET(connSock, &rset);
wset = rset;
tval.tv_sec = timeout;
tval.tv_usec = 0;
ret = select(connSock+1, &rset, &wset, NULL, timeout ? &tval : NULL);
if (ret == 0) {
close(connSock);
errno = ETIMEDOUT;
return(-1);
}
else if (ret < 0) {
return(-1);
}
fcntl(connSock, F_SETFL, flags);
opt_len = (socklen_t) sizeof(struct sctp_status);
getsockopt(connSock, IPPROTO_SCTP, SCTP_STATUS, &status, &opt_len);
printf ("assoc id = %d\n", status.sstat_assoc_id);
printf ("state = %d\n", status.sstat_state);
printf ("instrms = %d\n", status.sstat_instrms);
printf ("outstrms = %d\n", status.sstat_outstrms);
return 0;
}
int sendMessage(char remote_host[], int remote_port, char data[]) {
int ret, flags;
struct sockaddr_in servaddr;
char reply[1024];
errno = 0;
memset (&servaddr, 0, sizeof (servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_port = htons (remote_port);
servaddr.sin_addr.s_addr = inet_addr (remote_host);
printf("\nSending %s (%li bytes)", data, strlen(data));
ret = sctp_sendmsg (connSock, (void *) data, (size_t) strlen (data),
(struct sockaddr *) &servaddr, sizeof (servaddr), 46, 0, 0, 0, 0);
if (ret == -1)
{
printf("\nError sending errno(%d)", errno);
return -1;
}
else {
ret = sctp_recvmsg (connSock, (void *) reply, sizeof (reply), NULL,
NULL, NULL, &flags);
if (ret == -1)
{
printf("\nError receiving errno(%d)", errno);
return -1;
}
else {
printf("\nServer replied with %s", reply);
return 0;
}
}
}
int getSocket() {
return connSock;
}
I don't know if there's anything significant I need to set first before connecting that I missed out. I got the snippet from different sources so it's quite messy.
Another update, here's the tshark log of that code when executed:
3336.919408 local -> remote SCTP 82 INIT
3337.006690 remote -> local SCTP 810 INIT_ACK
3337.006727 local -> remote SCTP 774 COOKIE_ECHO
3337.085390 remote -> local SCTP 50 COOKIE_ACK
3337.086650 local -> remote SCTP 94 DATA
3337.087277 remote -> local SCTP 58 ABORT
3337.165266 remote -> local SCTP 50 ABORT
Detailed tshark log of this here.
Looks like the remote sent its COOKIE_ACK chunk but my client failed to set its state to ESTABLISHED (I double checked the sstate value of 3 here).
If the association setup processes completed the state should be SCTP_ESTABLISHED. SCTP_COOKIE_ECHOED indicated that association has not completely established. It means that originating side (your localhost in this case) has sent (once or several times) COOKIE_ECHO chunk which has not been acknowledged by COOKIE_ACK from remote end.
You can send messages in this state (SCTP will simply buffer it until it get COOKIE_ACK and resend it later on).
It is hard to say what went wrong based on information you provided. At this stage it is probably will be worth diving into wireshark trace, to see what remote side is replying on your COOKIE_ECHO.
Also if you can share your client/server side code that might help to identify the root cause.
UPDATE #1:
It should be also noted that application can abort association them self (e.g. if this association is not configured on that server). If you trying to connect to the random server (rather than your specific one) that is quite possible and actually makes sense in your case. In this case state of association on your side is COOKIE_ECHOED because COOKIE_ACK has not arrived yet (just a race condition). As I said previously SCTP happily accepts your data in this state and just buffers it until it receives COOKIE_ACK. SCTP on remote side sends COOKIE_ACK straight away, even before the application received execution control in accept(). If application decided to terminate the association in ungraceful way, it will send ABORT (that is your first ABORT in wireshark trace). Your side has not received this ABORT yet and sends DATA chunk. Since remote side considers this association as already terminated it cannot process DATA chunk, so it treats it as out of the blue (see RFC 4960 chapter 8.4) and sends another ABORT with t-bit set to 1.
I guess this is what happened in your case. You can confirm it easily just by looking into wireshark trace.

the unp book single-threaded server with select

In the book "UNIX Network Prgramming" 3rd, Vol 1, Section 6.8 "TCP Echo Server (Revisited)" of Chapter 6 "I/O multiplexing: The select and poll Functions", the book writes:
"Unfortunately, there is a problem with the server that we just showed. Consider what happens if a malicious client connects to the server, sends one byte of data(other than a newline), and then goes to sleep. The server will call read, which will read the single byte of data from the client and then block in the next call to read, waiting for more data from this client. The server is then blocked('hung' may be a better term)" by this one client and will not service any other clients (either new client connection or existing clients' data) until the malicious client either sends a newline or terminates."
However, I doubt that it is not the case the book described. If the "malicious" client is asleep when the second time the select() function get called, the corresponding socket descriptor will not in the ready-for-reading state, so that the read() function never gets the opportunity to block the single-threaded server. To verify this, I run the sample server and a "malicious" client only to find that the server is not blocked and corresponding to other clients normally.
I admit that when combining with I/O multiplexing calls such as select() or epoll(), it is recommended to use nonblocking I/O. But my question is, is there something wrong with the book's conclusion? Or there are conditions that may happen in real applications but not this simple examples? Or there's something wrong with my code? Thank you very much!
the sample server code(tcpservselect01.c):
#include "unp.h"
int
main(int argc, char **argv)
{
int i, maxi, maxfd, listenfd, connfd, sockfd;
int nready, client[FD_SETSIZE];
ssize_t n;
fd_set rset, allset;
char buf[MAXLINE];
socklen_t clilen;
struct sockaddr_in cliaddr, servaddr;
listenfd = Socket(AF_INET, SOCK_STREAM, 0);
bzero(&servaddr, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_addr.s_addr = htonl(INADDR_ANY);
servaddr.sin_port = htons(SERV_PORT);
Bind(listenfd, (SA *) &servaddr, sizeof(servaddr));
Listen(listenfd, LISTENQ);
maxfd = listenfd; /* initialize */
maxi = -1; /* index into client[] array */
for (i = 0; i < FD_SETSIZE; i++)
client[i] = -1; /* -1 indicates available entry */
FD_ZERO(&allset);
FD_SET(listenfd, &allset);
for ( ; ; ) {
rset = allset; /* structure assignment */
nready = Select(maxfd+1, &rset, NULL, NULL, NULL);
if (FD_ISSET(listenfd, &rset)) {/* new client connection */
clilen = sizeof(cliaddr);
connfd = Accept(listenfd, (SA *) &cliaddr, &clilen);
for (i = 0; i < FD_SETSIZE; i++)
if (client[i] < 0) {
client[i] = connfd; /* save descriptor */
break;
}
if (i == FD_SETSIZE)
err_quit("too many clients");
FD_SET(connfd, &allset);/* add new descriptor to set */
if (connfd > maxfd)
maxfd = connfd; /* for select */
if (i > maxi)
maxi = i; /* max index in client[] array */
if (--nready <= 0)
continue; /* no more readable descriptors */
}
for (i = 0; i <= maxi; i++) {/* check all clients for data */
if ( (sockfd = client[i]) < 0)
continue;
if (FD_ISSET(sockfd, &rset)) {
if ( (n = Read(sockfd, buf, MAXLINE)) == 0) {
/*4connection closed by client */
Close(sockfd);
FD_CLR(sockfd, &allset);
client[i] = -1;
} else
Writen(sockfd, buf, n);
if (--nready <= 0)
break; /* no more readable descriptors */
}
}
}
}
the "malicious" client code
#include "unp.h"
void
sig_pipe(int signo)
{
printf("SIGPIPE received\n");
return;
}
int
main(int argc, char **argv)
{
int sockfd;
struct sockaddr_in servaddr;
if (argc != 2)
err_quit("usage: tcpcli <IPaddress>");
sockfd = Socket(AF_INET, SOCK_STREAM, 0);
bzero(&servaddr, sizeof(servaddr));
servaddr.sin_family = AF_INET;
servaddr.sin_port = htons(9877);
Inet_pton(AF_INET, argv[1], &servaddr.sin_addr);
Signal(SIGPIPE, sig_pipe);
Connect(sockfd, (SA *) &servaddr, sizeof(servaddr));
Write(sockfd, "h", 1);
printf("go to sleep 20s\n");
sleep(20);
printf("wake up\n");
printf("go to sleep 20s\n");
Write(sockfd, "e", 1);
sleep(20);
printf("wake up\n");
exit(0);
}
I agree with you. The book's conclusion about DOS is wrong. First of all the book's sample server code didn't assume that the input data should consist of N bytes or end with a newline, so one-byte input without a following newline shouldn't do any harm to the server.
Google books link to the relevant page

Flexible socket application

I'm writing a game wich playing on LAN with socket. I use 4 bytes length prefix to know how many data in the rest like this:
void trust_recv(int sock, int length, char *buffer)
{
int recved = 0;
int justRecv;
while(recved < length) {
justRecv = recv(sock, buffer + recved, length - recved, 0);
if (justRecv < 0) return;
recved += justRecv;
}
}
void onDataArrival(int sock)
{
int length;
char *data;
trust_recv(sock, 4, (char *) &length);
data = new char[length];
trust_recv(sock, length, data);
do_somethings_with_data(data);
}
The problem is if someone (an intruder or hacker for example) sends data with other format (maybe only 2 bytes or the length of the rest lower than 4 bytes prefix value) or an network problem, my application will be go to "not responding" state and have to close (because I use blocking socket). How to make my socket application more flexible but don't swith socket to non-blocking mode to pass this issue? (Or any ideas for organize data or algorithms as well)
You can set a receive timeout, during the socket setup phase, with setsockopt() call and SO_RCVTIMEO parameter;
struct timeval tv;
tv.tv_sec =8;
tv.tv_usec = 0 ;
if (setsockopt (your_sock_fd, SOL_SOCKET, SO_RCVTIMEO, (char *)&tv, sizeof tv)
perror(“setsockopt error”);
then test the return of recv() and his errno
if (justRecv < 0)
{
if (errno == EAGAIN)
perror("TIMEOUT!");
return;
}