Relation between sockets, ports and processes in Computer Networking [duplicate] - sockets

This question already has answers here:
why cannot we use process id insted of taking the port we are binding
(3 answers)
Closed 5 years ago.
I am very new to TCP/IP networks and learning about sockets/ports. I have a few confusions. I am mentioning what I understand.
A node N1 has multiple processes running. Say a process P1 has some string that it wishes to send to some other Node N2. N1 will request OS to create a socket which is essentially like a network I/O streaming channel. Such a channel will be created and handed over to the process along with a socket descriptor. So, we can say that socket can be recognised in the world by the node i.e. IP of node + process which requested the socket. Hence, comes the concept of socket address which is basically IP of node + port address (used for identifying processs). So, my doubts are:
From where comes the idea of ports here. Socket can be identified as IP of node + Process ID. Why ports are required to identify a process. Why can't the process descriptor be self sufficient. Why port address. Examples?
Why do we need to bind the socket with a socket address if the node has to just pass the data and nothing needs to be received. Binding of socket address essentially means "to start recognising socket with IP address of node + port address apart from its descriptor" which is useful for other nodes if they wish to send some data to Node N1. But what I think is that for any process in a node that wish to communicate over network, there should be one "global" socket which will not be binded. All processes will use it for sending data only. If in case any node wish to recieve data, they can have a separate socket which will be binded so that other nodes in network can recognise that particular socket.
Where exactly does TCP/UDP fit in the picture? Can I have two ports which are like TCP port 3000 and UDP port 3000 i.e. separate ports with different transport protocol but same port numbers. Is this possible with sockets too?

So, we can say that socket can be recognised in the world by the node i.e. IP of node + process which requested the socket.
Not 'in the world'. Only within the localhost. The socket only exists within the localhost, and the process ID is only known within the localhost.
Hence, comes the concept of socket address which is basically IP of node + port address (used for identifying process)
No. The port identifies the service. The process implements the service.
From where comes the idea of ports here.
RFC 793.
Socket can be identified as IP of node + Process ID.
No they can't. A peer on another host has no way of getting a remote process ID. Some fixed operating-system-agnostic identifier is required. And a process can own many ports. The suggestion doesn't begin to make sense.
Why ports are required to identify a process.
Ports do not identify a process. The question doesn't make sense.
Why can't the process descriptor be self sufficient. Why port address.
Because the first question you asked is fallacious . This is just another version of it.
Why do we need to bind the socket with a socket address if the node has to just pass the data and nothing needs to be received.
Because connections are identified by address:port pairs.
Binding of socket address essentially means "to start recognising socket with IP address of node + port address apart from its descriptor" which is useful for other nodes if they wish to send some data to Node N1.
It is also rather useful for this node, to know where the incoming data should go.
But what I think is that for any process in a node that wish to communicate over network, there should be one "global" socket which will not be binded. All processes will use it for sending data only. If in case any node wish to recieve data, they can have a separate socket which will be binded so that other nodes in network can recognise that particular socket.
Regardless of the invalidity and pointlessness of this scheme, your thoughts are 40 years too late.
Can I have two ports which are like TCP port 3000 and UDP port 3000 i.e. separate ports with different transport protocol but same port numbers.
Yes.
Where exactly does TCP/UDP fit in the picture?
They implement ports.
Is this possible with sockets too?
I can't make any sense out of this question. All sockets are distinct from each other.

Related

are socket ports the same as regular ports [duplicate]

This question already has answers here:
What is the theoretical maximum number of open TCP connections that a modern Linux box can have
(3 answers)
Does the port change when a server accepts a TCP connection?
(3 answers)
How does the socket API accept() function work?
(4 answers)
Closed 3 years ago.
I read something I found contradictory with my current understanding of ports. If you google "how many ports does a server have", the first thing to come up states the following:
The server generally only ever uses one port, no matter how many clients are connected. It is the tuple of (client IP, client port,
server IP, server port) that must be unique for each TCP connection -
so the limit of 65535 ports is only relevant for how many connections
a single client can make to a single server.
I thought each time a client establishes a connection to a server, then a socket is creating using a regular port for the connection between the two?
If no, does it mean that a server can have more clients connected to it, than the maximum amount of regular ports?
I thought each time a client establishes a connection to a server, then a socket is creating using a regular port for the connection between the two?
The term "port" in this context is being used to describe, essentially, an address. The port number, along with the IP address, uniquely identifies one endpoint of the network.
Not only does the server endpoint generally only use a single port number, it would be a lot more difficult to make connections to the server if it didn't, because what port number would the client endpoint use to request the connection? DNS allows a client to look up the IP address, if the IP address is not already know, but there's no such facility for port numbers. So the port number has to be known in advance.
So, no…it is not the case that each time a client makes a connection, a socket is created using a "regular port" for the connection between the two. There's no "regular port". There's just "port", all ports are the same, and they are simply a number that identifies the endpoint's address.
If no, does it mean that a server can have more clients connected to it, than the maximum amount of regular ports?
Yes, it can. On the server end, the port number is (generally) always the same. For example, an HTTP server will (generally) use port 80. The listening socket will have as its port number "80", as will the server-side socket for each connection.
The port number can be reused like this, because each socket has other identifying characteristics besides the IP address and port number. In particular, the server's listening socket is unique; there is only one socket on the server end that has that IP address, that port number, and which has no connections (i.e. is listening).
Once a connection is made, a new socket is created to represent that connection. And that socket can be uniquely identified, because unlike the listening socket, it does have a connection (i.e. a remote endpoint) associated with it, along with the IP address and port number. When the client endpoint sends data to the server, the network layer can tell which socket to which that should be delivered, because that data comes from a specific remote endpoint, which also has a unique IP address and port number.
The combination of the server's and client's unique IP addresses and port numbers uniquely identifies that connection, making it distinct from any other socket on the server that may have the same server-side endpoint's IP address and port number.
In the text you quoted, this part is describing exactly this distinct, unique identification of a socket:
It is the tuple of (client IP, client port, server IP, server port) that must be unique for each TCP connection
In this way, the server's IP address and port number can be used an indefinite number of times (not counting other constrained resources on the server, like memory and tables that hold the state of the network connections).
The limitation on port numbers only comes into play when trying to create additional listening sockets (for servers) or additional connections (for clients). Servers typically won't run out of port numbers unless they are implementing a protocol that requires the server to create a connection back to a client's listening socket (this is uncommon), and clients won't run out of port numbers unless they try to make a very large number of connections.
It is this latter limit that this part of the text you quoted is referring to:
the limit of 65535 ports is only relevant for how many connections a single client can make to a single server.

Understand sockets, Client-Server architecture & Clients differentiation

I've read a lot of theory about sockets and Client-Server connection on this forums but some points remains blurred or some answers does not satisfy me completely.
Also, i'd like my words to be confirmed, completed or corrected :
1)_ A socket is made out of IP source (IP of the Client), Port source (a port automatically and randomly choosen by the OS between 1024 and 65535), IP destination (127.0.0.1 ? Something i don't get here), Port destination (developper defined-by port for the server) and protocol type.
There may be something wrong in those lines already.
But considering it is true, how can the server differenciate two processes accessing the server from the same machine ? (Understand, how the developper can make any difference if he wants to prevent multiple access from the same machine).
The only difference would be the source port which is auto-filled by the OS. In this case, it would act like it was a totally different machine, right ?
2)_ I heard there was actually a pair of sockets. One generated by the Client, and one by the server.
Is there really a need for the server to have a second socket ? Is this socket a simple replica to keep a copy in the "Client currently connected"-list or is it a different socket, with different values ?
3)_ When does a Client should "disconnect" ? At each query ? At the end of some process ? Other ?
Thanks for enlightenment !
1)_ A socket is made out of IP source (IP of the Client), Port source
(a port automatically and randomly choosen by the OS between 1024 and
65535), IP destination (127.0.0.1 ? Something i don't get here), Port
destination (developper defined-by port for the server) and protocol
type.
I wouldn't say a "socket is made" out of those data points; rather a TCP-connection can be uniquely identified using just those data points:
1. Source IP - the IP address of the client computer
2. Source Port - the port number (on the client computer) that the client is sending packets from and receiving packets on
3. Destination IP - the IP address of the server computer
4. Destination Port - the port number (on the server computer) that the server is sending packets from and receivign packets on
5. Protocol type - what communications-protocol is in use (i.e. either TCP or UDP)
But considering it is true, how can the server differenciate two
processes accessing the server from the same machine ?
It can differentiate them because the 5-tuple (above) will be unique for each of the two connections. In particular, in the TCP packets the server receives from process #2, field #2 (Source Port) will be different from the value it has in the packets received from process #1.
The only difference would be the source port which is auto-filled by
the OS. In this case, it would act like it was a totally different
machine, right ?
The server can act however it was programmed to act -- but in most cases the server will be programmed not to care whether two client connections come from the same physical machine or not. To most servers, a client is a client, and a client's physical location is not that important.
Is there really a need for the server to have a second socket ? Is
this socket a simple replica to keep a copy in the "Client currently
connected"-list or is it a different socket, with different values ?
A socket is a just data structure that lives in a computer's memory to help it keep track of the current state associated with a particular network connection. Since both the client and the server need to keep track of their end of the connection, both the client the server will have their own socket representing their endpoint. (Note the difference between a "TCP-connection", which you can imagine as an imaginary/virtual wire running from one computer to another, and the two "sockets", which would be the imaginary/virtual connectors at the ends of that wire, that attach the wire to the client-program on one end, and the server-program on the other end)
3)_ When does a Client should "disconnect" ? At each query ? At the
end of some process ? Other ?
Whenever it wants to; it's up to the programmer(*). There are startup/shutdown costs to opening and connecting a new socket, but there is also some ongoing memory and CPU overhead to keeping a socket open indefinitely, so the programmer will have to make a design decision about whether he wants to keep sockets open over extended periods, or not.
(*) Note that in a modern OS, if the client program exits or crashes, the socket will be automatically closed and the connection automatically disconnected by the OS.

how router handles 'well known' ports from host [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
Very short qustion. Lets say user1 is connected to internet and running a http server # local. he needs to set port forwarding to work this. redirecting all incoming requests from public ip to local ip's port 80.
my doubt is that, User1 opens mozilla firefox , lets say, port 12343 , assigned by the os.
from this, (192.168.0.14:12343) to google.com:80... sometimes our router changes the incoming port to another port # NAT . clear..
My question: is there is any port forwarding set at the router to handle to route the packet.. ie, requests from google:80 to :12343 . plz correct me if am wrong at any protocol suite layers. i am new to this.
When connection is established through NAT, NAT maintains mapping between inside port and outside port. That is, when the packet comes from outside to the port 54321, NAT knows to forwward it to internal network IP 192.168.0.1., port 12345.
To explain further, let's dwell into details. Let's talk about transparent NAT. Transparent NAT's are ones which do not require any special configuration on locla software (unlike HTTP proxy servers, for instance). They usually serve as network gateways, so that OS knows to route network trafic to such a gateway (almost all home routers work in this mode).
When someone opens web page from desktop - local address 192.169.1.1, local port 12345, remote address stackoverflow.com, remote port 80 - OS directs trafic to network gateway (192.168.1.0).
Gateway sees the trafic as coming from 192.168.1.1, port 12345. On the packet, it replaces 192.168.1.1 with it's outside IP (say, 2.2.2.2) and gives it a port - say, 54321. It also creates an entry in it's mapping tables, indicating that all trafic incoming from outside for port 54321 is to be forwarded to 192.168.1.1, port 12345. StackOverflow server sees the trafic as coming from gateway, and responds back to the gateway address and port. Gatewat sees response, consults mapping table and forwards it to the local machine, where it is seen by the browser - and thus my answer is displayed on your screen.
I think there is nothing to do with NAT here. NAT just translates the internal local address(like 192.168.1.1) to an external global address(like 139.130.4.5). I hope you have adequate knowledge on OSI model. Let me explain it. When a packet reaches the transport layer,it is assigned a random port number(ranging from 0-65535),either TCP or UDP, by the OS. However, the OS can only port numbers from 49152 to 65535, as several ports are registered or is used for some specific process. A port is used to identify a service or a process. After adding port number, the packets are given to the network layer, which adds the source address and destination address of the packet. Switching is a process that happens in the network layer. This switching mechanism is responsible for the source to destination delivery of packets. Internet uses packet switching. When you are sending a packet in this switching mechanism, the packets get routed to several switches between the source and destination. Every packet that is sent through these switches are routed based on a switching table or routing table. This table contains details such as the MAC address and a physical port of the switch through which the packet is received and sent.
This is the only port forwading that happens inside a router or a switch. Delivering the packet to a specified MAC address is the only duty of a switch.
Every packet you sent through a router goes to its destination based on the routing table. Several protocols work in this layer to make the source to destination delivery possible and some of them are ARP,IP,RARP etc.
Additionally, a packet is encapsulated with information from top layers as it moves down the layers. So, at the receiver side, the packet will comes at network layer and then gets decapsulated and it is moved to the transport layer, which then decapsulate the packed and send it to the corresponding process base on the port.
So, what I told is that there is no connection with a process (port number) and the physical port of the router. It is true that the packet travels through the physical port of the router but it doesn't know anything about the process that sends the packet.

socket programming - why web server still using listen port 80 to communicate with client even after they accepted the connection?

Usually a web server is listening to any incoming connection through port 80. So, my question is that shouldn't it be that in general concept of socket programming is that port 80 is for listen for incoming connection. But then after the server accepted the connection, it will use another port e.g port 12345 to communicate with the client. But, when I look into the wireshark, the server is always using port 80 during the communication. I am confused here.
So what if https://www.facebook.com:443, it has hundreds of thousands of connection to the it at a second. Is it possible for a single port to handle such a large amount of traffic?
A particular socket is uniquely identified by a 5-tuple (i.e. a list of 5 particular properties.) Those properties are:
Source IP Address
Destination IP Address
Source Port Number
Destination Port Number
Transport Protocol (usually TCP or UDP)
These parameters must be unique for sockets that are open at the same time. Where you're probably getting confused here is what happens on the client side vs. what happens on the server side in TCP. Regardless of the application protocol in question (HTTP, FTP, SMTP, whatever,) TCP behaves the same way.
When you open a socket on the client side, it will select a random high-number port for the new outgoing connection. This is required, otherwise you would be unable to open two separate sockets on the same computer to the same server. Since it's entirely reasonable to want to do that (and it's very common in the case of web servers, such as having stackoverflow.com open in two separate tabs) and the 5-tuple for each socket must be unique, a random high-number port is used as the source port. However, each of those sockets will connect to port 80 at stackoverflow.com's webserver.
On the server side of things, stackoverflow.com can already distinguish between those two different sockets from your client, again, because they already have different client-side port numbers. When it sees an incoming request packet from your browser, it knows which of the sockets it has open with you to respond to because of the different source port number. Similarly, when it wants to send a response packet to you, it can send it to the correct endpoint on your side by setting the destination port number to the client-side port number it got the request from.
The bottom line is that it's unnecessary for each client connection to have a separate port number on the server's side because the server can already uniquely identify each client connection by its client IP address and client-side port number. This is the way TCP (and UDP) sockets work regardless of application-layer protocol.
shouldn't it be that in general concept of socket programming is that port 80 is for listen for incoming connection. But then after the server accepted the connection, it will use another port e.g port 12345 to communicate with the client.
No.
But, when I look into the wireshark, the server is always using port 80 during the communication.
Yes.
I am confused here.
Only because your 'general concept' isn't correct. An accepted socket uses the same local port as the listening socket.
So what if https://www.facebook.com:443, it has hundreds of thousands of connection to the it at a second. Is it possible for a single port to handle such a large amount of traffic?
A port is only a number. It isn't a physical thing. It isn't handling anything. TCP is identifying connections based on the tuple {source IP, source port, target IP, target port}. There's no problem as long as the entire tuple is unique.
Ports are a virtual concept, not a hardware ressource, it's no harder to handle 10 000 connection on 1 port than 1 connection each on 10 000 port (it's probably much faster even)
Not all servers are web servers listening on port 80, nor do all servers maintain lasting connections. Web servers in particular are stateless.
Your suggestion to open a new port for further communication is exactly what happens when using the FTP protocol, but as you have seen this is not necessary.
Ports are not a physical concept, they exist in a standardised form to allow multiple servers to be reachable on the same host without specialised multiplexing software. Such software does still exist, but for entirely different reasons (see: sshttp). What you see as a response from the server on port 80, the server sees as a reply to you on a not-so-random port the OS assigned your connection.
When a server listening socket accepts a TCP request in the first time ,the function such as Socket java.net.ServerSocket.accept() will return a new communication socket whoes port number is the same as the port from java.net.ServerSocket.ServerSocket(int port).
Here are the screen shots.

Can two applications listen to the same port?

Can two applications on the same machine bind to the same port and IP address? Taking it a step further, can one app listen to requests coming from a certain IP and the other to another remote IP?
I know I can have one application that starts off two threads (or forks) to have similar behavior, but can two applications that have nothing in common do the same?
The answer differs depending on what OS is being considered. In general though:
For TCP, no. You can only have one application listening on the same port at one time. Now if you had 2 network cards, you could have one application listen on the first IP and the second one on the second IP using the same port number.
For UDP (Multicasts), multiple applications can subscribe to the same port.
Edit: Since Linux Kernel 3.9 and later, support for multiple applications listening to the same port was added using the SO_REUSEPORT option. More information is available at this lwn.net article.
Yes (for TCP) you can have two programs listen on the same socket, if the programs are designed to do so. When the socket is created by the first program, make sure the SO_REUSEADDR option is set on the socket before you bind(). However, this may not be what you want. What this does is an incoming TCP connection will be directed to one of the programs, not both, so it does not duplicate the connection, it just allows two programs to service the incoming request. For example, web servers will have multiple processes all listening on port 80, and the O/S sends a new connection to the process that is ready to accept new connections.
SO_REUSEADDR
Allows other sockets to bind() to this port, unless there is an active listening socket bound to the port already. This enables you to get around those "Address already in use" error messages when you try to restart your server after a crash.
Yes.
Multiple listening TCP sockets, all bound to the same port, can co-exist, provided they are all bound to different local IP addresses. Clients can connect to whichever one they need to. This excludes 0.0.0.0 (INADDR_ANY).
Multiple accepted sockets can co-exist, all accepted from the same listening socket, all showing the same local port number as the listening socket.
Multiple UDP sockets all bound to the same port can all co-exist provided either the same condition as at (1) or they have all had the SO_REUSEADDR option set before binding.
TCP ports and UDP ports occupy different namespaces, so the use of a port for TCP does not preclude its use for UDP, and vice versa.
Reference: Stevens & Wright, TCP/IP Illustrated, Volume II.
In principle, no.
It's not written in stone; but it's the way all APIs are written: the app opens a port, gets a handle to it, and the OS notifies it (via that handle) when a client connection (or a packet in UDP case) arrives.
If the OS allowed two apps to open the same port, how would it know which one to notify?
But... there are ways around it:
As Jed noted, you could write a 'master' process, which would be the only one that really listens on the port and notifies others, using any logic it wants to separate client requests.
On Linux and BSD (at least) you can set up 'remapping' rules that redirect packets from the 'visible' port to different ones (where the apps are listening), according to any network related criteria (maybe network of origin, or some simple forms of load balancing).
Yes Definitely. As far as i remember From kernel version 3.9 (Not sure on the version) onwards support for the SO_REUSEPORT was introduced. SO_RESUEPORT allows binding to the exact same port and address, As long as the first server sets this option before binding its socket.
It works for both TCP and UDP. Refer to the link for more details: SO_REUSEPORT
No. Only one application can bind to a port at a time, and behavior if the bind is forced is indeterminate.
With multicast sockets -- which sound like nowhere near what you want -- more than one application can bind to a port as long as SO_REUSEADDR is set in each socket's options.
You could accomplish this by writing a "master" process, which accepts and processes all connections, then hands them off to your two applications who need to listen on the same port. This is the approach that Web servers and such take, since many processes need to listen to 80.
Beyond this, we're getting into specifics -- you tagged both TCP and UDP, which is it? Also, what platform?
You can have one application listening on one port for one network interface. Therefore you could have:
httpd listening on remotely accessible interface, e.g. 192.168.1.1:80
another daemon listening on 127.0.0.1:80
Sample use case could be to use httpd as a load balancer or a proxy.
When you create a TCP connection, you ask to connect to a specific TCP address, which is a combination of an IP address (v4 or v6, depending on the protocol you're using) and a port.
When a server listens for connections, it can inform the kernel that it would like to listen to a specific IP address and port, i.e., one TCP address, or on the same port on each of the host's IP addresses (usually specified with IP address 0.0.0.0), which is effectively listening on a lot of different "TCP addresses" (e.g., 192.168.1.10:8000, 127.0.0.1:8000, etc.)
No, you can't have two applications listening on the same "TCP address," because when a message comes in, how would the kernel know to which application to give the message?
However, you in most operating systems you can set up several IP addresses on a single interface (e.g., if you have 192.168.1.10 on an interface, you could also set up 192.168.1.11, if nobody else on the network is using it), and in those cases you could have separate applications listening on port 8000 on each of those two IP addresses.
Just to share what #jnewton mentioned.
I started an nginx and an embedded tomcat process on my mac. I can see both process runninng at 8080.
LT<XXXX>-MAC:~ b0<XXX>$ sudo netstat -anp tcp | grep LISTEN
tcp46 0 0 *.8080 *.* LISTEN
tcp4 0 0 *.8080 *.* LISTEN
Another way is use a program listening in one port that analyses the kind of traffic (ssh, https, etc) it redirects internally to another port on which the "real" service is listening.
For example, for Linux, sslh: https://github.com/yrutschle/sslh
If at least one of the remote IPs is already known, static and dedicated to talk only to one of your apps, you may use iptables rule (table nat, chain PREROUTING) to redirect incomming traffic from this address to "shared" local port to any other port where the appropriate application actually listen.
Yes.
From this article:
https://lwn.net/Articles/542629/
The new socket option allows multiple sockets on the same host to bind to the same port
Yes and no. Only one application can actively listen on a port. But that application can bequeath its connection to another process. So you could have multiple processes working on the same port.
You can make two applications listen for the same port on the same network interface.
There can only be one listening socket for the specified network interface and port, but that socket can be shared between several applications.
If you have a listening socket in an application process and you fork that process, the socket will be inherited, so technically there will be now two processes listening the same port.
I have tried the following, with socat:
socat TCP-L:8080,fork,reuseaddr -
And even though I have not made a connection to the socket, I cannot listen twice on the same port, in spite of the reuseaddr option.
I get this message (which I expected before):
2016/02/23 09:56:49 socat[2667] E bind(5, {AF=2 0.0.0.0:8080}, 16): Address already in use
If by applications you mean multiple processes then yes but generally NO.
For example Apache server runs multiple processes on same port (generally 80).It's done by designating one of the process to actually bind to the port and then use that process to do handovers to various processes which are accepting connections.
Short answer:
Going by the answer given here. You can have two applications listening on the same IP address, and port number, so long one of the port is a UDP port, while other is a TCP port.
Explanation:
The concept of port is relevant on the transport layer of the TCP/IP stack, thus as long as you are using different transport layer protocols of the stack, you can have multiple processes listening on the same <ip-address>:<port> combination.
One doubt that people have is if two applications are running on the same <ip-address>:<port> combination, how will a client running on a remote machine distinguish between the two? If you look at the IP layer packet header (https://en.wikipedia.org/wiki/IPv4#Header), you will see that bits 72 to 79 are used for defining protocol, this is how the distinction can be made.
If however you want to have two applications on same TCP <ip-address>:<port> combination, then the answer is no (An interesting exercise will be launch two VMs, give them same IP address, but different MAC addresses, and see what happens - you will notice that some times VM1 will get packets, and other times VM2 will get packets - depending on ARP cache refresh).
I feel that by making two applications run on the same <op-address>:<port> you want to achieve some kind of load balancing. For this you can run the applications on different ports, and write IP table rules to bifurcate the traffic between them.
Also see #user6169806's answer.