One socket or two for inter-process communication on single host - sockets

If I want to use (UDP) sockets as an inter-process communication mechanism on a single PC, are there restrictions on what I can set up due to the two endpoints having the same IP address?
I imagine that in order to have two processes A and B both listening on the same IP/port address, SO_REUSADDR would be necessary - correct? And even though that might conceptually allow for full duplex comms over a single socket, there are other questions I have if I try to go full duplex:
would I end up receiving my own transmissions, and have to filter them out?
would I be exposing myself to other processes injecting spurious or malicious data into my sockets due to the use of SO_REUSEADDR... or do I face this possibility simply by using (connectionless) UDP?
how would things be different (in an addressing/security/restrictions sense) if I chose to use TCP instead?
I'm confident that there is a viable solution using two sockets at each end (one for A -> B data, one for B ->A data)... but is there a viable solution using a single socket at each end? Would there be any clear advantages to using one full-duplex socket per process if it is possible?

The question arises from a misunderstanding. The misunderstanding arises from reading variable names like receivePort and sendPort with different values, and reading them as if they have an implicit link to the socket at the local end. This might make one (mistakenly) believe that two sockets are being used, or must be used - one for send, one for receive. This is wrong - a single socket (at each end) is all that is required.
If using variables to refer to ports on a single host, it is preferable to name them such that it is clear that one is local or pertaining to "this" process, and the other is remote or peer and pertains to the address of a different process, despite being on the same local host. Then it should be clearer that, like any socket, it is entirely possibly to support both send and receive from the single socket with its single port number.
In this scenario (inter-process communication on the same host necessarily using different port numbers for the single socket at each end) all the other questions (SO_REUSEADDR, TCP vs UDP and receiving one's own transmissions) are distractions arising from the misunderstanding.

Related

Difference between port number and socket

I started reading UNIX network programming by W. Richard Stevens and I am very confused between a port and a socket . when I read on internet it said that socket is an endpoint for a connection and for port number it was written that , IP address and port no form a unique pair .
So now my question is that :
(1) What is the difference between these two ?
(2)How are sockets and ports internally manipulated. Are sockets a file ?
(3) How is data sent when we send it using an application ?
(4) If sockets are there then why do we use port numbers ?
Sorry for my English.. Thanks in advance for the reply.
(1) What is the difference between these two ?
A computer running IP networking always has a fixed number of ports -- 65535 TCP ports and 65535 UDP ports. A network packet's header contains a 16-bit unsigned-short field in it specifying which of those ports the packet should be delivered to.
Sockets, on the other hand, are demand-allocated by each program. A socket serves as a handle/interface between the program and the OS's networking stack, and is used to build and specify a context for a particular networking task. A socket may or may not be bound to a port, and it's also possible (and common) to have more than one socket bound to a particular port at the same time.
(2)How are sockets and ports internally manipulated. Are sockets a
file ?
That's totally up to the OS; and different OS's do it different ways. It's unclear what you mean by "a file" in this question, but in general sockets do not have anything to do with the filesystem. On the other hand, one feature of Unix-style OS's is that socket descriptors are also usable in the much same way that filesystem file descriptors are -- i.e. you can pass them to read()/write()/select(), etc and get useful results. Other OS's, such as Windows, do not support that feature and for them you must use a completely separate set of function calls for sockets vs files.
(3) How is data sent when we send it using an application ?
The application calls the send() function (or a similar function such as sendto()), passes in the relevant socket descriptor along with a pointer to the data it wants to send, and then it is up to the network stack to copy that data into a packet and deliver it to the appropriate networking device for transmission.
(4) If sockets are there then why do we use port numbers ?
Because you need a way to communicate with particular programs on other computers, and computer A has no way of knowing what sockets are present (if any) on computer B. But port numbers are fixed, so it is possible for programmers to use them as a rendezvous point for communication -- for example, your web browser knows that a web server is almost certain to be listening for incoming HTTP requests on port 80 whenever the server is running, so it can send its requests to port 80 with a reasonable expectation of getting a useful response back. If it had to specify a socket as a target instead, what would it specify? The server's socket numbers are arbitrary and likely to be different every time the server runs.
1) What is the difference between these two ?
(2)How are sockets and ports internally manipulated. Are sockets a file ?
A socket is (IP+Port):
A socket is like a telephone (i.e. end to end device for communication)
IP is like your telephone number (i.e. address of your socket)
Port is like the person you want to talk to (i.e. the service you want to order from that address)
A socket is part of a process. A process in linux is a file.
(3) How is data sent when we send it using an application ?
Data is sent by converting it to bytes. There is little/big endian problem regarding the ordering in bytes so you have to take this into consideration when coding.
(4) If sockets are there then why do we use port numbers ?
A socket is (address + port) that means the person you want to talk to (port) can be reachable from many telephone numbers (IPs) and thus from many sockets (that does not mean that the person on one telephone number will reply to you the same as the one in the other telephone number because his job here/there may be different).

ZMQ Socket solution for a 1-to-1 two-way communication with connection re-establishment?

I'm looking for a ZMQ-capable solution for a communication between an bound endpoint that is connected to 0 or 1 peers and no more than that. The communication is two-way, and the connection can be ended or severed at any point in time; and the connection can be re-establish with either a new peer or the same peer. It doesn't matter if the bound endpoint blocks or doesn't block if it doesn't have a peer on the other side.
What ZMQ socket pair would suit this use case the best? I was initially thinking REP/REQ, but the socket pair allows for multiple REQs to connect to one REP, which I don't want; it also will need to handle the "I'm waiting for a recv/I'm going to send something" lockstep paradigm when someone disconnects. PAIR also seems bad because it doesn't naturally handle the reconnect, but it has the "0 or 1 peer" restriction I want.
Any suggestions?
The short answer: Unfortunately there isn't a pattern that exactly fits your needs out of the box.
The closest pattern is the ZMQ PAIR to PAIR pattern. However it has some limitations in the following ways:
ZMQ_PAIR sockets are designed for inter-thread communication across
the zmq_inproc(7) transport and do not implement functionality such as
auto-reconnection. ZMQ_PAIR sockets are considered experimental and
may have other missing or broken aspects.
ROUTER and DEALER is the most flexible pattern. You can control it to set the restrictions you need.

kernel-based (Linux) data relay between two TCP sockets

I wrote TCP relay server which works like peer-to-peer router (supernode).
The simplest case are two opened sockets and data relay between them:
clientA <---> server <---> clientB
However the server have to serve about 2000 such A-B pairs, ie. 4000 sockets...
There are two well known data stream relay implementations in userland (based on socketA.recv() --> socketB.send() and socketB.recv() --> socketA.send()):
using of select / poll functions (non-blocking method)
using of threads / forks (blocking method)
I used threads so in the worst case the server creates 2*2000 threads! I had to limit stack size and it works but is it right solution?
Core of my question:
Is there a way to avoid active data relaying between two sockets in userland?
It seems there is a passive way. For example I can create file descriptor from each socket, create two pipes and use dup2() - the same method like stdin/out redirecting. Then two threads are useless for data relay and can be finished/closed.
The question is if the server should ever close sockets and pipes and how to know when the pipe is broken to log the fact?
I've also found "socket pairs" but I am not sure about it for my purpose.
What solution would you advice to off-load the userland and limit amount fo threads?
Some extra explanations:
The server has defined static routing table (eg. ID_A with ID_B - paired identifiers). Client A connects to the server and sends ID_A. Then the server waits for client B. When A and B are paired (both sockets opened) the server starts the data relay.
Clients are simple devices behind symmetric NAT therefore N2N protocol or NAT traversal techniques are too complex for them.
Thanks to Gerhard Rieger I have the hint:
I am aware of two kernel space ways to avoid read/write, recv/send in
user space:
sendfile
splice
Both have restrictions regarding type of file descriptor.
dup2 will not help to do something in kernel, AFAIK.
Man pages: splice(2) splice(2) vmsplice(2) sendfile(2) tee(2)
Related links:
Understanding sendfile() and splice()
http://blog.superpat.com/2010/06/01/zero-copy-in-linux-with-sendfile-and-splice/
http://yarchive.net/comp/linux/splice.html (Linus)
C, sendfile() and send() difference?
bridging between two file descriptors
Send and Receive a file in socket programming in Linux with C/C++ (GCC/G++)
http://ogris.de/howtos/splice.html
OpenBSD implements SO_SPLICE:
relayd asiabsdcon2013 slides / paper
http://www.manualpages.de/OpenBSD/OpenBSD-5.0/man2/setsockopt.2.html
http://metacpan.org/pod/BSD::Socket::Splice .
Does Linux support something similar or only own kernel-module is the solution?
TCPSP
SP-MOD described here
TCP-Splicer described here
L4/L7 switch
HAProxy
Even for loads as tiny as 2000 concurrent connections, I'd never go with threads. They have the highest stack and switching overhead, simply because it's always more expensive to ensure that you can be interrupted anywhere than when you can only be interrupted at specific places. Just use epoll() and splice (if your sockets are TCP, which seems to be the case) and you'll be fine. You can even make epoll work in event triggered mode, where you only register your fds once.
If you absolutely want to use threads, use one thread per CPU core to spread the load, but if you need to do this, it means you're playing at speeds where affinity, RAM location on each CPU socket etc... plays a significant role, which doesn't seem to be the case in your question. So I'm assuming that a single thread is more than enough in your case.

At the level of IP, does "leave the connection open" have a specific technical meaning - such as intermediate gateways storing an IP map entry?

I am an experienced socket-level programmer in C++, but I do not understand what happens at the IP network level when a socket connection is left open (vs. being closed by calling the close function on the socket from within code).
I have studied the IP header and tried to understand if leaving a socket open has any implications at the IP level.
At the TCP level, leaving a socket open could make sense to me, because perhaps that means the "sequence number" field in the TCP header continues to increment. However, that would be a purely endpoint-based implementation, and therefore could not cut down on transit time for TCP packets. It is my understanding that leaving a connection open generally means that transit time between endpoints across the internet is decreased for packets.
The question is, does it mean anything at the IP level to leave a socket connection open?
The best guess I have is that if a socket connection remains open, that intervening gateways along the complete IP network path will attempt to leave an entry in their mapping table so that the next hop can be executed immediately, without needing to do a broadcast to all connected gateways in order to determine the next hop.
(Perhaps the overhead of DNS lookup is also avoided in this fashion.)
Am I correct in guessing that "leaving a connection open" corresponds to map entries remaining in place on intermediate IP gateways (which speeds up packet transfer)?
Direct answer: No.
Your question suggests that you don't fully understand the purpose of TCP, which is to establish a data stream between two hosts. Keeping that in mind, the purpose of leaving a connection open should be obvious: if you close the connection, the stream will end.
The status of a TCP connection is not visible on the IP level; it's only of relevance to TCP. With the exception of NAT gateways, intermediate hosts do not generally keep track of the status of TCP connections passing through them. (In many cases, it'd be impossible for them to do so -- large routers have far more connections running through them than they could possibly track.)
The best guess I have is that if a socket connection remains open, that intervening gateways along the complete IP network path will attempt to leave an entry in their mapping table so that the next hop can be executed immediately, without needing to do a broadcast to all connected gateways in order to determine the next hop.
This guess is incorrect. A router will have some sort of algorithm for picking a route based on the destination IP, based on a set of routing tables it keeps internally. Read up on BGP for details on how this is determined on large routers; on smaller routers, the routing table is typically defined by the administrator.
First of all, let's clear up a misconception:
that intervening gateways along the complete IP network path will attempt to leave an entry in their mapping table so that the next hop can be executed immediately, without needing to do a broadcast to all connected gateways in order to determine the next hop.
Routers never "broadcast to all connected gateways" in order to determine the next hop. If a packet arrives and the router does not already know how to route it, the packet is simply dropped (possibly with an ICMP error message being sent back to the source). The job of the routing protocols that run on routers is to prepopulate the router's routing table with routes learned from peers so that they are then prepared to receive packets and route them.
Also, "the complete IP network path" is not well-defined. The network path can change at any time as links fail on the network or new links become available. It can even change from one packet to the next in the absence of routing changes due to load balancing.
Back to your question: no, whether or not a socket is closed has no impact on IP. IP is stateless in the sense that every packet is self-contained and routed independently.
Whether or not a socket is closed does make a difference to TCP, but, as you note, that concerns only the two nodes at the endpoints of the connection.
The impact of "leaving a connection open" on speed, such that it is, is that establishing a connection in TCP requires a round-trip. But more to the point, a connection also has semantic meaning to most protocols running on TCP. Two bits of data sent on the same connection are related in a way that two bits of data sent on different connections are not.

Sockets Asyn Connection

I am new to Async Socket Connection. Can you please explain. How does this technology work.
There's an existing application (server) which requires socket connections to transmit data back and forward. I already create my application (.NET) but the Server application doesn't seem to understand the XML data that I am sending. My documentation is giving me two ports one to Send and another one to Receive.
I need to be sure that I understand how this works.
I got the IP addresses and also the two Ports to be used.
A socket is the most "raw" way you can use to send byte-level TCP and UDP packets across a network.
For example, your browser uses a socket TCP connection to connect to the StackOverflow web server on port 80. Your browser and the server exchange commands and data according to an agreed-on structure/protocol (in this case, HTTP). An asynchronous socket is no different than a synchronous socket except that is does not block the thread that's using it.
This is really not the most ideal way to work (check and see if your server/vendor application supports SOAP/Web Services, etc), but if this is really the only way, there could be a number of reasons why it's failing. To name a few...
Not actually getting connected or sending data. Run a test using WinsockTool (http://www.isatools.org/tools/winsocktool.msi) and simulate your client first to make sure the server is working as expected.
Encoding incorrect - You're sending raw bytes across the network... Make sure you're using the correct encoding to convert your XML into bytes (ASCII, UTF8, etc).
Buffer Length - Your sending buffer (the amount of data you can transmit in one shot) may be too small or the server may expect a content of a certain length, and your XML could be getting truncated.
let's break a misconception... sockets are FULL-DUPLEX: you connect to a server using one port, then you can send AND receive data through the same socket, no need for 2 port numbers. (actually, there is a port assigned for receiving data, but it is: 1. assigned automatically when creating the socket (unless told so) and 2. of no use in the function calls to receive data)
so you tell us that your documentation give you 2 port numbers... i assume that the "server" is an already existing in-house application, and you are trying to talk to it. if the doc lists 2 ports, then you will need 2 sockets: one for sending and another one for receiving. now i would suggest you first use a synchronous socket before trying the async way: a synchronous socket is less error-prone for a first test.
(by the way, let's break another misconception: if well coded, once a server listen on a port, it can receive any number of connection through the same port number, no need to open 2 listening ports to accept 2 connections... sorry for the re-alignment, but i've seen those 2 errors committed enough time, it gives me a urge to kill)