Socket Programming: bind() system call - sockets

While studying computer networks as the course subject, my concept was that operating system distinguishes a packet based on the destination port and delivers it to application which is running on that port.
Later I came to know that we can connect to two different destinations (DestinationIP:DestinationPort) using the same source(SourceIP:SourcePort).
tcp 0 0 192.168.1.5:60000 199.7.57.72:80 ESTABLISHED 1000 196102 10179/firefox
tcp 0 0 192.168.1.5:60000 69.192.3.67:443 ESTABLISHED 1000 200361 10179/firefox
tcp 0 0 192.168.1.5:60000 69.171.234.18:80 ESTABLISHED 1000 196107 10179/firefox
tcp 0 0 192.168.1.5:60000 107.21.19.182:22 ESTABLISHED 1000 196399 10722/ssh
tcp 0 0 192.168.1.5:60000 69.171.234.18:443 ESTABLISHED 1000 201792 10179/firefox
tcp 0 0 192.168.1.5:60000 69.192.3.34:443 ESTABLISHED 1000 200349 10179/firefox
tcp 0 0 127.0.0.1:4369 127.0.0.1:51889 ESTABLISHED 129 12036 1649/epmd
tcp 0 0 192.168.1.5:60000 69.192.3.58:443 ESTABLISHED 1000 200352 10179/firefox
tcp 0 0 192.168.1.5:60000 74.125.236.88:80 ESTABLISHED 1000 200143 10179/firefox
tcp 0 0 192.168.1.5:60000 174.122.92.78:80 ESTABLISHED 1000 202935 10500/telnet
tcp 0 0 192.168.1.5:60000 74.125.236.87:80 ESTABLISHED 1000 201600 10179/firefox
Going little more into depths, I came to know that if an application uses bind() system call to bind a socket descriptor with a particular IP and port combination, then we can't use the same port again. Otherwise if a port is not binded to any socket descriptor, we can use the same port and IP combination again to connect to a different destination.
I read in the man page of bind() syscall that
bind() assigns the address specified to by addr to the socket referred to by the file descriptor sockfd.
My question are:
When we don't call bind() syscall generally while writing a client program then how does the OS automatically selects the port number.
When two different applications use the same port and IP combination to connect to two different servers and when those servers reply back, how does the OS finds out that which packet needs to be redirected to which application.

When we don't call bind() syscall generally while writing a client
program then how does the OS automatically selects the port number
The OS picks a random unused port (not necessarily the "next" one).
how does the OS finds out that which packet needs to be redirected to
which application
Each TCP connection is identified by a 4-tuple: (src_addr, src_port, dst_addr, dst_port) which is unique and thus enough to identify where each segment belongs.
EDIT
When we don't call bind() syscall generally while writing a client
program then how does the OS automatically selects the port number.
Sometime before "connecting" in the case of a TCP socket. For example, Linux has the function inet_csk_get_port to get an unused port number. Look for inet_hash_connect in tcp_v4_connect.

For 1: OS just picks the next available port.
For 2: It is done based on the dst port. Client applications will connect to same server over different client ports

I think for a client program OS maintains a table with socket fd(opened by client) and server IP+port after establishment of TCP connection.So whenever server replies back, OS can pick up the socket fd against the particular server IP+PORT and data is written onto the socket fd. So server reply can be available to the client on this particular socket fd.

Related

How many sockets are currently been used on windows 10

I'm trying to find out how many sockets are currently being used on my machine.
If it's possible, I need an API that returns the exact number of sockets being used at any time.
I have tried TCPView, and the command netstat for many version, but all I get is a list of connections.
Just the amount?
netstat -a -n | find /i "127.0.0.1" | find /I "ESTABLISHED" > t.txt && powershell -command "& Get-Content "t.txt" | Measure-Object -Line"
Replace 127.0.0.1 if needed. On my box it resulted in
Lines Words Characters Property
----- ----- ---------- --------
24
Looking at t.txt, it will will show something like:
TCP 127.0.0.1:49672 127.0.0.1:49673 ESTABLISHED
TCP 127.0.0.1:49673 127.0.0.1:49672 ESTABLISHED
TCP 127.0.0.1:49674 127.0.0.1:49675 ESTABLISHED
TCP 127.0.0.1:49675 127.0.0.1:49674 ESTABLISHED
TCP 127.0.0.1:57354 127.0.0.1:57355 ESTABLISHED
TCP 127.0.0.1:57355 127.0.0.1:57354 ESTABLISHED
TCP 127.0.0.1:57356 127.0.0.1:57357 ESTABLISHED
TCP 127.0.0.1:57357 127.0.0.1:57356 ESTABLISHED
If you are just interested in the number of TCP/UDP sockets in use, have a look at the following APIs:
GetTcpStatistics(), GetTcpStatisticsEx(), GetTcpStatisticsEx2()
GetUdpStatistics(), GetUdpStatisticsEx(), GetUdpStatisticsEx2()
But those don't really tell you anything useful, and the stats are global, not per-interface. There are other APIs that can enumerate the available sockets and retrieve actual details about them (statuses, port numbers, etc), which you can use to fine-tune your searching as needed (eg, ESTABLISHED on IP x Port y):
GetTcpTable(), GetTcpTable2(), GetExtendedTcpTable()
GetUdpTable(), GetUdp6Table(), GetExtendedUdpTable()

TCP socket state become persist after changing IP address even configured keep-alive early

I met a problem about TCP socket keepalive.
TCP keep-alive is enabled and configured after the socket connection, and system has its own TCP keep-alive configuration.
'ss -to' can show the keep-alive information of the connection.
The network interface is a PPPOE device, if we ifup the interface, it will get a new ip address. And the old TCP connection will keep establish until keep-alive timeout.
But sometimes 'ss -to' shows that the tcp connection becomes 'persist', which will take long time (about 15 minutes) to close.
Following is the result of 'ss -to':
ESTAB 0 591 172.0.0.60:46402 10.184.20.2:4335 timer:(persist,1min26sec,14)
The source address is '172.0.0.60', but the network interface's actual address has been updated to '172.0.0.62'.
This is the correct result of 'ss -to':
ESTAB 0 0 172.0.0.62:46120 10.184.20.2:4335 timer:(keepalive,4.480ms,0)
I don't know why the "timer" is changed to 'persist', which makes keep-alive be disable.
In short: TCP keepalive is only relevant if the connection is idle, i.e. no data to send. If instead there are still data to send but sending is currently impossible due to missing ACK or a window of 0 then other timeouts are relevant. This is likely the problem in your case.
For the deeper details see The Cloudflare Blog: When TCP sockets refuse to die.

Can I write() to a socket just after connect() call, but before TCP connection established?

My experiment showed that I can write to a non-blocking socket just after the connect() call, with no TCP connection established yet, and the written data correctly received by the peer after connection occured (asynchronously). Is this guaranteed on Linux / FreeBSD? I mean, will write() return > 0 when the connection is still in progress? Or maybe I was lucky and the TCP connection was successfully established between the connect() and write() calls?
The experiment code:
int fd = socket (PF_INET, SOCK_STREAM, 0);
fcntl(fd, F_SETFL, O_NONBLOCK)
struct sockaddr_in addr;
memset(&addr, 0, sizeof(addr));
addr.sin_family = AF_INET;
addr.sin_port = htons(_ip_port.port);
addr.sin_addr.s_addr = htonl(_ip_port.ipv4);
int res = connect(fd, (struct sockaddr*)&addr, sizeof(addr));
// HERE: res == -1, errno == 115 (EINPROGRESS)
int r = ::write(fd, "TEST", 4);
// HERE: r == 4
P.S.
I process multiple listening and connecting sockets (incoming and outgoing connections) in single thread and manage them by epoll. Usually, when I want to create a new outgoing connection, I call non-blocking connect() and wait the EPOLLOUT (epoll event) and then write() my data. But I noticed that I can begin writing before the EPOLLOUT and get appropriate result. Can I trust this approach or should I use my old fashion approach?
P.P.S.
I repeated my experiment with a remote host with latency 170ms and got different results: the write() (just after connect()) returned -1 with errno == EAGAIN. So, yes, my first experiment was not fair (connecting to fast localhost), but still I think the "write() just next to connect()" can be used: if write() returned -1 and EAGAIN, I wait the EPOLLOUT and retry writing. But I agree, this is dirty and useless approach.
Can I write() to a socket just after connect() call, but before TCP connection established?
Sure, you can. It's just likely to fail.
Per the POSIX specification of write():
[ECONNRESET]
A write was attempted on a socket that is not connected.
Per the Linux man page for write():
EDESTADDRREQ
fd refers to a datagram socket for which a peer address has
not been set using connect(2).
If the TCP connect has not completed, your write() call will fail.
At least on Linux, the socket is marked as not writable until the [SYN, ACK] is received from the peer. This means the system will not send any application data over the network until the [SYN, ACK] is received.
If the socket is in non-blocking mode, you must use select/poll/epoll to wait until it becomes writable (otherwise write calls will fail with EAGAIN and no data will be enqueued). When the socket becomes writable, the kernel has usually already sent an empty [ACK] message to the peer before the application has had time to write the first data, which results in some unnecessary overhead due to the API design.
What appears to be working is to after calling connect on a non-blocking socket and getting EINPROGRESS, set the socket to blocking and then start to write data. Then the kernel will internally first wait until the [SYN, ACK] is received from the peer and then send the application data and the initial ACK in a single packet, which will avoid that empty [ACK]. Note that the write call will block until [SYN, ACK] is received and will e.g. return -1 with errno ECONNREFUSED, ETIMEDOUT etc. if the connection fails. This approach however does not work in WSL 1 (Windows Subsystem for Linux), which just fails will EPIPE immediately (no SIGPIPE though).
In any case, not much can be done to eliminate this initial round-trip time due to the design of TCP. If the TCP Fast Open (TFO) feature is supported by both endpoints however, and can accept its security issues, this round-trip can be eliminated. See https://lwn.net/Articles/508865/ for more info.

Socat not closing tcp connection

I use socat 1.7.3.1-r0 and run following command on an alpine 3.3 linux server:
socat -d -d -d PTY,link=/dev/ttyFOOBAR,echo=0,raw,unlink-close=0 TCP-LISTEN:7000,forever,reuseaddr
Socat will listen for clients and create a bidirectional communication by passing data from the virtual serial port /dev/ttyFOOBAR to the client and back again over TCP. Once the client disconnects socat should exit.
When such a connection is established socat logs the following:
I socat by Gerhard Rieger - see www.dest-unreach.org
I This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit. (http://www.openssl.org/)
I This product includes software written by Tim Hudson (tjh#cryptsoft.com)
I setting option "symbolic-link" to "/dev/ttyFOOBAR"
I setting option "echo" to 0
I setting option "raw"
I setting option "unlink-close" to 0
I openpty({5}, {6}, {"/dev/pts/3"},,) -> 0
N PTY is /dev/pts/3
I setting option "forever" to 1
I setting option "so-reuseaddr" to 1
I socket(2, 1, 6) -> 7
I starting accept loop
N listening on AF=2 0.0.0.0:7000
I accept(7, {2, AF=2 CLIENT_IP:PORT}, 16) -> 8
N accepting connection from AF=2 CLIENT_IP:PORT on AF=2 172.20.0.2:7000
I permitting connection from AF=2 CLIENT_IP:PORT
I close(7)
I resolved and opened all sock addresses
N starting data transfer loop with FDs [5,5] and [8,8]
ss command on the server prints:
Netid State Recv-Q Send-Q Local Address:Port Peer Address:Port
tcp ESTAB 0 0 172.20.0.2:7000 CLIENT_IP:PORT
The problem is, that when I disconnect the client (by switching it off), the tcp connection is still established and no addition logging is coming from socat. ss still shows the connection as ESTAB. Any ideas why? When I again connect the client following appears in the logs:
W read(8, 0x7fa8f48c4020, 8192): Connection reset by peer
N socket 2 to socket 1 is in error
N socket 2 (fd 8) is at EOF
I poll timed out (no data within 0.500000 seconds)
I close(5)
I shutdown(8, 2)
I shutdown(8, 2): Socket not connected
N exiting with status 0
But why does this happen on connect instead of disconnect?
If there is no data to send or receive on a socket and you cut the underlying connection neither side is aware until it attempts to send data. Normally, that would be application level data, but at the protocol level you can enable TCP keep alives to emulate flowing data whenever there is no real data.
According to the socat manpage you could try something like:
socat -d -d -d PTY,link=/dev/ttyFOOBAR,echo=0,raw,unlink-close=0 TCP-LISTEN:7000,forever,reuseaddr,keepalive,keepidle=10,keepintvl=10,keepcnt=2
(keepalive actually looks like the essential option but it is unclear what the defaults will be for the tuning options if unset.)

FIN,ACK after PSH,ACK

I'm trying to implement a communication between a legacy system and a Linux system but I constantly get one of the following scenarios:
(The legacy system is server, the Linux is client)
Function recv(2) returns 0 (the peer has performed an orderly shutdown.)
> SYN
< SYN, ACK
> ACK
< PSH, ACK (the data)
> FIN, ACK
< ACK
> RST
< FIN, ACK
> RST
> RST
Function connect(2) returns -1 (error)
> SYN
< RST, ACK
When the server have send its data, the client should answer with data, but instead I get a "FIN, ACK"
Why is it like this? How should I interpret this? I'm not that familiar with TCP at this level
When the server have send its data, the client should answer with data, but I instead get a "FIN, ACK" Why is it like this? How should I interpret this?
It could be that once the server has sent the data (line 4) the client closes the socket or terminates prematurely and the operating system closes its socket and sends FIN (line 5). The server replies to FIN with ACK but the client has ceased to exist already and its operating system responds with RST. (I would expect the client OS to silently ignore and discard any TCP segments arriving for a closed connection during the notorious TIME-WAIT state, but that doesn't happen for some reason.)
http://en.wikipedia.org/wiki/Transmission_Control_Protocol#Connection_termination:
Some host TCP stacks may implement a half-duplex close sequence, as Linux or HP-UX do. If such a host actively closes a connection but still has not read all the incoming data the stack already received from the link, this host sends a RST instead of a FIN (Section 4.2.2.13 in RFC 1122). This allows a TCP application to be sure the remote application has read all the data the former sent—waiting the FIN from the remote side, when it actively closes the connection. However, the remote TCP stack cannot distinguish between a Connection Aborting RST and this Data Loss RST. Both cause the remote stack to throw away all the data it received, but that the application still didn't read
After FIN, PSH, ACK --> One transaction completed
Second request receiving but sending [RST] seq=140 win=0 len=0