Interrupting gen_tcp:recv in erlang - sockets

We have a gen_server process that manages the pool of passive sockets on the client side by creating them and borrowing them for other processes. Any other process can borrow a socket, sends a request to the server using the socket, gets a reply through gen_tcp:recv, and then releases the socket to the gen_server socket pool process.
The socket pool process monitors all processes that borrow the sockets. If any of the borrowed process is down, it gets a down signal from it:
handle_info({'DOWN', Ref, process, _Pid, _Reason}, State) ->
In this case we would like to drain the borrowed socket, and reuse it by putting back into the pool. The problem is that while trying to drain a socket using gen_tcp:recv(Socket, 0, 0), we get inet ealready error message, meaning that recv operation is in progress.
So the question is how to interrupt previous recv, successfully drain a socket, and reuse for other processes.
Thanks.

One more level of indirection will greatly simplify the situation.
Instead of passing sockets to processes that need to use them, have each socket controlled by a separate process that owns it and represents the socket within the system. Route Erlang-side messages to and from sockets as necessary to implement the "borrowing" of sockets (even more flexibly, pass the socket controller a callback module that speaks a given protocol, so as soon as data comes over the network it is interpreted as Erlang messages internally).
If this is done you will not lose control of sockets or have them in indeterminate states -- they will instead be held by a single, owning process the entire time. Instead of having the route-manager/pool-manager process receive the 'DOWN' messages, have the socket controllers monitor its current using process. When a 'DOWN' is received then you can change state according to whatever is necessary.
You can catch yourself in some weird situations passing open files descriptors, socket and other types of ports around among sockets that aren't designated as the owner of them. Passing ports and sockets around also becomes a problem if you need to scale a program across several nodes (suddenly you have to care where things are passed and what node they are on, etc.).

Related

epoll and multiple processes

Assume, I have a process that binds to a socket, than forks himself to create 4 instances of the current process.
New processes inherit the file descriptor of the parent socket and are able to do an "accept" on it. If I put the socket descriptor into epoll and try to connect to the socket, all 4 workers are being notified (EPOLLIN) there is some data to read/accept. all workers try to do an accept 3 of them fail and only one can do accept.
How can I get around this behaviour?
This is too big performance penalty letting most of the workers to fail every time on incomming connection.
How this can be avoided?
If accept() returns -1/EWOULDBLOCK/EAGAIN, just return to the poll loop.
Use EPOLLONESHOT while adding sockets in epoll fd which should notify only one process for one event.
Or you can safeguard epoll fd so that only one worker waits on epoll fd.
This is thundering herd issue which can be solved by using EPOLLEXCLUSIVE flag. That would ensure that only one waiting process is woken up per event.

Can a single process perform concurrent non-blocking IO over the same port?

I am trying to understand how concurrency works at a system level.
Backstory
I have an application and a datastore. The datastore can have several processes running and so can handle multiple requests concurrently. The datastore accepts communication over a single TCP port using a protocol in the format <msg length> <operation code> <operation data>
The existing application code blocks on datastore io. I could spin up several threads to achieve concurrency, but still each thread would block on io. I have some single thread non-blocking IO libraries but using them should require me to do some socket programming.
Question
How would a single-process connection pool to a single non-blocking port work? From what I understand the port maintains a sort of mapping so it can send the response to correct place when a response is ready. But I read that is uses the requestor's ip as the key. If multiple requests to the same port occur from the same process, wouldn't the messages get mixed up / intermingled?
Or, does each connection get assigned a unique key, so to make a connection pool I need only store a list of connection objects and they are guaranteed never to interact with each other?
Edit: don't know why i said TCP, and half the content of this question is unnecessary ... I am embarrassed. Probably ought to delete it, actually. I voted.
The datastore accepts communication over a single TCP port
The result of the accept() is a new full-duplex socket which can be read and written to concurrently and independently of all other sockets in the process. The fact that its local port is shared is irrelevant. TCP ports aren't physical objects, only numbers.
Non-blocking mode and data stores have nothing to do with it.

Adding a socket descriptor to the io_service dynamically and removing it

I am writing a gateway service which listens on the network socket and routes the packets received to separate daemons. I am planning to use boost asio but I am stuck with few questions. Here is the design of the server I am planning to implement:
The gateway will be listening for TCP connections using boost asio.
The gateway will also listen for streamed Unix domain connections from daemons using boost asio.
Whenever there is a packet on the tcp connection the gateway looks at the protocol tag in the packet and puts the packet on the unix domain connection on which the service will is listening.
Whenever there is a packet on the service connection the gateway looks at the client tag and puts on the respective client connection.
Every descriptor in the gateway will be a NONBLOCKING one.
I am stuck with one particular problem, when the gateway is writing to the service connection, there are chances of getting an EAGAIN or EWOULDBLOCK error if the service socket is full. I plan to tackle this by queuing the buffers and "waiting for the service connection get ready for write".
If I were to use select system call "waiting for the service connection get ready for write" would translate to adding the fd in the writefd list and passing it to select. Once the service connection is ready for write I will write the enqueued buffers to the connection and will remove the service connection from the writefdlist of select.
How do i do the same thing with boost asio? Is such thing possible?
If you want to go with that approach, then use boost::asio::null_buffers to enable Reactor-Style operations. Additionally, set the Boost.Asio socket to non-blocking through the socket::non_blocking() member function. This option will set the synchronous socket operations to be non-blocking. This is different from setting the native socket as non-blocking, as Boost.Asio sets the native socket as non-blocking, and emulates blocking for synchronous operations.
However, if Proactor-Style operations are an option, then consider using them, as it allows the application to ignore some of the lower level details. When using proactor style operations, Boost.Asio will perform the I/O on the application's behalf, properly handling EWOULDBLOCK, EAGAIN, and ERROR_RETRY logic. For example, when Boost.Asio incurs one of the previously mentioned errors, it pushes the I/O operation back into its internal queue, deferring its reattempt, allowing other operations to be attempted.
Often times, there are two constraints which require the use of Reactor-Style operations instead of Proactor-Style operations:
Another library expects to perform the I/O operations itself.
Memory limitations. With a Proactor, the lifespan of a buffer must exceed the duration of a read or write operation, and concurrent operations may require their own buffer. A Reactor allows for the lifetime of a buffer to begin when data is ready to be read, and end when data is no longer being used.
Using boost::asio you dont need to mess with nonblocking mode and/or with return codes such as EAGAIN EWOULDBLOCK etc. Also, you are not "adding a socket to pool loop" or something like that; this is hidden for you since it more highlevel framework.
Typical pattern is
You create io_service object
You create socket with binding to io_service
You create some async event (async_connect, async_read, async_write or so on) on the socket.
You run dispatching with io_service::run or similar methods.
asio will trigger your handler when time is come.
Check out for examples on the boost::asio page. I think async echo server can illustrate technique for your task.
If multiple threads will be writing to the same socket object used for a connection, then you need to use a mutex (or critical section if using Windows) to single thread the code.
As for - "when the gateway is writing to the service connection, there are chances of getting an EAGAIN or EWOULDBLOCK error if the service socket is full", I believe that ASIO handles that for you internally so you don't have to worry about it.

Socket programming - API doubt

There was this question posted in class today about API design in socket programming.
Why are listen() and accept() provided as different functions and not merged into one function?
Now as far as I know, listen marks a connected socket as ready to accept connections and sets a max bound on the queue of incoming connections. If accept and listen are merged, can such a queue not be maintained?
Or is there some other explanation?
Thanks in advance.
listen() means "start listening for clients"
accept() means "accept a client, blocking until one connects if necessary"
It makes sense to separate these two, because if they were merged, then the single merged function would block. This would cause problems for non-blocking I/O programs.
For example, lets take a typical server that wants to listen for new client connections, but also monitor existing client connections for new messages. A server like this typically uses a non-blocking I/O model so that it is not blocked on any one particular socket. So it needs a way to "start listening" on the server socket without blocking on it. Once listening on the server socket has been initiated, the server socket is added to the bucket of sockets being monitored via select() (called poll() on some systems). The select() call would indicate when there is a client pending on the server socket. Then the program can then call accept() without fear of blocking on that socket.
listen(2) makes given TCP socket a server socket, i.e. creates a queue for accepting connection requests from the clients. Only the listening side port, and possibly IP address, are bound (thus you need to call bind(2) before listen(2)). accept(2) then actually takes such connection request from that queue and turns it into a connected socket (four parts required for two-way communication - source IP address, source port number, destination IP address, and destination port number - are assigned). listen(2) is called only once, while accept(2) is usually called multiple times.
Under the hood, bind assigns an address and a port to a socket descriptor. It means the port is now reserved for that socket, and therefore the system won't be able to assign the same port to another application (an exception exists, but I won't go into details here). It's also a one-time-per-socket operation.
Then listen is responsible for establishing the number of connections that can be queued for a given socket descriptor, and indicate that you're now willing to receive connections.
On the other hand, accept is used to dequeue the first connection from the queue of pending connections, and create a new socket to handle further communication through it. It may be called multiple times, and generally is. By default, this operation is blocking if there are no connections in the queue.
Now suppose you want to use an async IO mechanism (like epoll, poll, kqueue, select, etc). If listen and accept were a single API, how would you indicate that a given socket is willing to receive connections? The async mechanism needs to know you wish to handle this type of event as well.
With quite different semantics, it makes sense to have them apart.

in socket programming, can accept() the same listening socket in multi-process(thread)?

i.e.
open a listening socket in parent process
call epoll_wait(listening_socket) in child1,child2,child3....
call accept in each child if there is connection request
In general, it's not a good idea to have multiple threads performing IO on the same socket without some kind of synchronization between them. In your scenario, it's possible you'd see something like:
incoming connection request wakes up epoll_wait in all N child threads
all N threads call accept, 1 call succeeds, N-1 block (or fail, if your listening socket is non-blocking)
The more usual approach is to have the parent thread loop calling accept on the listening socket, and starting a child thread for each incoming request. (Or, if you're concerned about thread creation overhead, you can have a pool of child threads that wait on a condition variable when idle; the parent adds the newly-accepted socket to a queue and uses pthread_cond_signal to wake a child to handle it.)
Yes you can but your example is a little incomplete:
Create listening_socket
Create epoll set using epoll_create
Register listening_socket to the epoll set using epoll_ctl
call epoll_wait(epoll set) in child1,child2,child3....
call accept in each child if there is connection request
epoll_wait makes sures that only one thread gets the connection event and only that thread will call accept.
If you create two epoll sets and register your listening_socket to both of them you will receive the event twice, once per epoll set, and this is not recommended.
You may refer to this tutorial http://www.devshed.com/c/a/BrainDump/Linux-Files-and-the-Event-Poll-Interface/ and search some interesting discussions about epoll in this forum http://www.developerweb.net/forum/ to learn more.
For more elaborate examples you can always refer to the libev, libevent or nginx source code.