I've noticed that it is possible to reuse a socket in another process by using the sockets' fileno attribute (given socket module).
However, I cannot find a way to achieve the same using aiohttp... how can I reuse sockets from one process in another by passing e.g. a fileno?
The idea is that I want the first process to lose the access to the socket (without closing) and start reusing the connection in another process.
Related
We have a gen_server process that manages the pool of passive sockets on the client side by creating them and borrowing them for other processes. Any other process can borrow a socket, sends a request to the server using the socket, gets a reply through gen_tcp:recv, and then releases the socket to the gen_server socket pool process.
The socket pool process monitors all processes that borrow the sockets. If any of the borrowed process is down, it gets a down signal from it:
handle_info({'DOWN', Ref, process, _Pid, _Reason}, State) ->
In this case we would like to drain the borrowed socket, and reuse it by putting back into the pool. The problem is that while trying to drain a socket using gen_tcp:recv(Socket, 0, 0), we get inet ealready error message, meaning that recv operation is in progress.
So the question is how to interrupt previous recv, successfully drain a socket, and reuse for other processes.
Thanks.
One more level of indirection will greatly simplify the situation.
Instead of passing sockets to processes that need to use them, have each socket controlled by a separate process that owns it and represents the socket within the system. Route Erlang-side messages to and from sockets as necessary to implement the "borrowing" of sockets (even more flexibly, pass the socket controller a callback module that speaks a given protocol, so as soon as data comes over the network it is interpreted as Erlang messages internally).
If this is done you will not lose control of sockets or have them in indeterminate states -- they will instead be held by a single, owning process the entire time. Instead of having the route-manager/pool-manager process receive the 'DOWN' messages, have the socket controllers monitor its current using process. When a 'DOWN' is received then you can change state according to whatever is necessary.
You can catch yourself in some weird situations passing open files descriptors, socket and other types of ports around among sockets that aren't designated as the owner of them. Passing ports and sockets around also becomes a problem if you need to scale a program across several nodes (suddenly you have to care where things are passed and what node they are on, etc.).
Network programming noob here,
I'm confused by behavior of accept and connect socket functions. In most programming languages, wrappers of those functions return different types of values: accept returns a new descriptor that we can use to send/receive data, but connect returns nothing(or returns an error code).
To me it looks like connect should also return a descriptor. They both open a channel between two sockets, but only one of the functions return something useful to communicate with the remote socket.
This effects the way I structure my program. For example, I can easily spawn a new worker/thread/etc. for every incoming connection, but it's not easily possible for every connections that I create using connect, because I don't have a new descriptor in this case.(so I can't use recv and send without doing some bookkeeping)
Can anyone explain me why is this working this way?
I think the reason is that because socket wrappers in programming languages follow the BSD API closely and in this case my question is: Why BSD sockets work this way? Current implementation leads to unnecessarily complex programs or redundant sockets. I either need to do more bookkeping(leads to more complex programs) or create a new socket for every out-going connection(leads to redundant sockets).
Thanks.
connect() takes an existing descriptor as input. You create and configure the descriptor first and then connect() it to the server. So there is no need for it to return a new descriptor since you have to create the descriptor beforehand.
accept() also takes an existing descriptor as input, however that descriptor represents the listening socket. When a client is accepted, a unique descriptor is needed for reading/writing with that particular client, the listening descriptor cannot be used for that, so accept() returns a new descriptor.
You don't need to structure your thread differently. On the client side, after you connect() to the server, spawn a thread and give it the descriptor that was connected. On the server side, after you accept() a client, spawn a thread and give it the descriptor that was accepted. In both cases, the thread only has to care about which descriptor to operate on, not where that descriptor came from. Both threads can use recv() and send() as needed, and then close() the descriptor when done using it.
You cannot reuse a socket descriptor for a new connection (well, WinSock2 on Windows has non-standard extensions to allow that, but that feature is not commonly used). Once a connection is disconnected, its descriptors must be closed. You have to create new descriptors whenever you need to create a new connection.
I am trying to understand how concurrency works at a system level.
Backstory
I have an application and a datastore. The datastore can have several processes running and so can handle multiple requests concurrently. The datastore accepts communication over a single TCP port using a protocol in the format <msg length> <operation code> <operation data>
The existing application code blocks on datastore io. I could spin up several threads to achieve concurrency, but still each thread would block on io. I have some single thread non-blocking IO libraries but using them should require me to do some socket programming.
Question
How would a single-process connection pool to a single non-blocking port work? From what I understand the port maintains a sort of mapping so it can send the response to correct place when a response is ready. But I read that is uses the requestor's ip as the key. If multiple requests to the same port occur from the same process, wouldn't the messages get mixed up / intermingled?
Or, does each connection get assigned a unique key, so to make a connection pool I need only store a list of connection objects and they are guaranteed never to interact with each other?
Edit: don't know why i said TCP, and half the content of this question is unnecessary ... I am embarrassed. Probably ought to delete it, actually. I voted.
The datastore accepts communication over a single TCP port
The result of the accept() is a new full-duplex socket which can be read and written to concurrently and independently of all other sockets in the process. The fact that its local port is shared is irrelevant. TCP ports aren't physical objects, only numbers.
Non-blocking mode and data stores have nothing to do with it.
I am writing a gateway service which listens on the network socket and routes the packets received to separate daemons. I am planning to use boost asio but I am stuck with few questions. Here is the design of the server I am planning to implement:
The gateway will be listening for TCP connections using boost asio.
The gateway will also listen for streamed Unix domain connections from daemons using boost asio.
Whenever there is a packet on the tcp connection the gateway looks at the protocol tag in the packet and puts the packet on the unix domain connection on which the service will is listening.
Whenever there is a packet on the service connection the gateway looks at the client tag and puts on the respective client connection.
Every descriptor in the gateway will be a NONBLOCKING one.
I am stuck with one particular problem, when the gateway is writing to the service connection, there are chances of getting an EAGAIN or EWOULDBLOCK error if the service socket is full. I plan to tackle this by queuing the buffers and "waiting for the service connection get ready for write".
If I were to use select system call "waiting for the service connection get ready for write" would translate to adding the fd in the writefd list and passing it to select. Once the service connection is ready for write I will write the enqueued buffers to the connection and will remove the service connection from the writefdlist of select.
How do i do the same thing with boost asio? Is such thing possible?
If you want to go with that approach, then use boost::asio::null_buffers to enable Reactor-Style operations. Additionally, set the Boost.Asio socket to non-blocking through the socket::non_blocking() member function. This option will set the synchronous socket operations to be non-blocking. This is different from setting the native socket as non-blocking, as Boost.Asio sets the native socket as non-blocking, and emulates blocking for synchronous operations.
However, if Proactor-Style operations are an option, then consider using them, as it allows the application to ignore some of the lower level details. When using proactor style operations, Boost.Asio will perform the I/O on the application's behalf, properly handling EWOULDBLOCK, EAGAIN, and ERROR_RETRY logic. For example, when Boost.Asio incurs one of the previously mentioned errors, it pushes the I/O operation back into its internal queue, deferring its reattempt, allowing other operations to be attempted.
Often times, there are two constraints which require the use of Reactor-Style operations instead of Proactor-Style operations:
Another library expects to perform the I/O operations itself.
Memory limitations. With a Proactor, the lifespan of a buffer must exceed the duration of a read or write operation, and concurrent operations may require their own buffer. A Reactor allows for the lifetime of a buffer to begin when data is ready to be read, and end when data is no longer being used.
Using boost::asio you dont need to mess with nonblocking mode and/or with return codes such as EAGAIN EWOULDBLOCK etc. Also, you are not "adding a socket to pool loop" or something like that; this is hidden for you since it more highlevel framework.
Typical pattern is
You create io_service object
You create socket with binding to io_service
You create some async event (async_connect, async_read, async_write or so on) on the socket.
You run dispatching with io_service::run or similar methods.
asio will trigger your handler when time is come.
Check out for examples on the boost::asio page. I think async echo server can illustrate technique for your task.
If multiple threads will be writing to the same socket object used for a connection, then you need to use a mutex (or critical section if using Windows) to single thread the code.
As for - "when the gateway is writing to the service connection, there are chances of getting an EAGAIN or EWOULDBLOCK error if the service socket is full", I believe that ASIO handles that for you internally so you don't have to worry about it.
I get a socket from the accept function in main process, and two or more threads can send data from it. Then, the access of the socket must be mutually-excluive when two or more threads want to send data from it parallelly. My problem is if the OS will add a lock to the connected socket in the bottom of the system .
Since you mention accept(), I take it we are talking about stream sockets.
You can send simultaneously from multiple threads or processes on the same socket, but there is no guarantee that the data from multiple senders will not be interleaved together. So you probably don't want to do it.
If you are sending small amounts of data at a time that don't cause the socket to block, you can probably expect the data blocks submitted to each simultaneous send()/write() call to arrive contiguously at the other end. PROBABLY. You can't count on it.