Why do outgoing sockets need port numbers? - sockets

I understand why a server would need sockets for incoming data, but I do not understand why it is necessary that a socket connecting to another computer needs a source port.

While others have mentioned the exact reason why, let me illustrate the point by giving you an example:
Say you want to ssh to your server. OK, you ssh in and do some stuff. Then you tail a log file. So now you don't have access to the console anymore. No problem you think, I'll ssh again...
With one port number, if you ssh again that second connection will be a mirror of the first since the server won't know that there are two connections (no source port number to tell the difference) so you're out of luck.
With two port numbers you can ssh a second time to get a second console.
Say you browse a website, say Stackoverflow. You're reading a question but you think you've seen it before. You open a new tab in your browser to stackoverflow to do a search.
With only one port number the server have no way of knowing which packet belongs to which socket on the client so opening a second page will not be possible (or worse, both pages receive mixed data from each other).
With two port numbers the server will see two different connections from the client and send the correct data to the correct tab.
So you need two port numbers for client to tell what data is coming from what server and for the server to tell what data is coming from which socket from the client.

A TCP connection is defined in terms of the source and destination IP addresses and port numbers.
Otherwise for example you could never distinguish between two connections to the same server from the same client host.

Check out this link:
http://compnetworking.about.com/od/basiccomputerarchitecture/g/computer-ports.htm
Ultimately, they allow different applications and services to share the same networking resources. For example, your browser probably uses port 80, but your email application may use port 25.

TCP communication is two-way. A segment being sent from the server, even if it is in response to a segment from the client, is an incoming segment as seen from the client. If a client opens multiple connections to the same port on the server (such as when you load multiple StackOverflow pages at once), both the server and the client need to be able to tell the TCP segments from the different connections apart; this is done by looking at the combination of source port and destination port.

Related

How many sockets do Google open for every request it receives?

The following is my recent interview experience with a reputed network software company. I was asked questions about interconnecting TCP level and web requests and that confused me a lot. I really would like to know expert opinions on the answers. It is not just about the interview but also about a fundamental understanding of how networking work (or how application layer and transport layer cross-talk, if at all they do).
Interviewer: Tell me the process that happens behind the scenes when
I open a browser and type google.com in it.
Me: The first thing that happens is a socket is created which is
identified by {SRC-IP, SRC-PORT, DEST-IP, DEST-PORT, PROTOCOL}. The
SRC-PORT number is a random number given by the browser. Usually the TCP/IP
connection protocol (three-way handshake is established). Now
both the client (my browser) and the server (Google) are ready to handle
requests. (TCP connection is established).
Interviewer: Wait, when does the name resolution happen?
Me: Yep, I am sorry. It should have happened before even the socket is created.
DNS name resolve happens first to get the IP address of Google to
reach at.
Interviewer: Is a socket created for DNS name resolution?
Me: hmm, I actually do not know. But all I know DNS name resolution is
connectionless. That is, it's not TCP but UDP. Only a single
request-response cycle happens. (So there is a new socket created for DNS
name resolution).
Interviewer: google.com is open for other requests from other
clients. So is establishing your connection with Google blocking
other users?
Me: I am not sure how Google handles this. But in a typical socket
communication, it is blocking to a minimal extent.
Interviewer: How do you think this can be handled?
Me: I guess the process forks a new thread and creates a socket to handle my
request. From now on, my socket endpoint of communication with
Google is this child socket.
Interviewer: If that is the case, is this child socket’s port number
different than the parent one?
Me: The parent socket is listening at 80 for new requests from
clients. The child must be listening at a different port number.
Interviewer: How is your TCP connection maintained since your destination port number has changed. (That is the src-port number sent on Google's packet) ?
Me: The dest-port that I see as a client is always 80. When
a response is sent back, it also comes from port 80. I guess the OS/the
parent process sets the source port back to 80 before it sends back the
post.
Interviewer: How long is your socket connection established with
Google?
Me: If I don’t make any requests for a period of time, the
main thread closes its child socket and any subsequent requests from
me will be like I am a new client.
Interviewer: No, Google will not keep a dedicated child socket for
you. It handles your request and discards/recycles the sockets right
away.
Interviewer: Although Google may have many servers to serve
requests, each server can have only one parent socket opened at port 80. The number of clients to access Google's webpage must be larger than the number of servers they have. How is this usually handled?
Me: I am not sure how this is handled. I see the only way it could
work is spawn a thread for each request it receives.
Interviewer: Do you think the way Google handles this is different from
any bank website?
Me: At the TCP-IP socket level, it should be
similar. At the request level, slightly different because a session
is maintained to keep state between requests for banks' websites.
If someone can give an explanation of each of the points, it will be very helpful for many beginners in networking.
How many sockets do Google open for every request it receives?
This question doesn't actually appear in the interview, but it's in your title so I'll answer it. Google doesn't open any sockets at all. Your browser does that. Google accepts connections, in the form of new sockets, but I wouldn't describe that as 'opening' them.
Interviewer : Tell me the process that happens behind the scene when I open a browser and type google.com in it.
Me : The first thing that happens is a socket is created which is identified by {SRC-IP, SRC-PORT, DEST-IP, DEST-PORT, PROTOCOL}.
No. The connection is identified by the tuple. The socket is an endpoint to the connection.
The SRC-PORT number is a random number given by browser.
No. By the operating system.
Usually TCP/IP connection protocol (three way handshake is established). Now the both client (my browser) and server (google) are ready to handle requests. (TCP connection is established)
Interviewer: wait, when does the name resolution happens?
Me: yep, I am sorry. It should have happened before even the socket is created. DNS name resolve happens first to get the IP address of google to reach at.
Interviewer : Is a socket created for DNS name resolution?
Me : hmm, Iactually do not know. But all I know DNS name resolution is a connection-less. That is it not TCP but UDP. only a single request-response cycle happens. (so is a new socket created for DNS name resolution).
Any rationally implemented browser would delegate the entire thing to the operating system's Sockets library, whose internal functioning depends on the OS. It might look at an in-memory cache, a file, a database, an LDAP server, several things, before going out to a DNS server, which it can do via either TCP or UDP. It's not a great question.
Interviewer: google.com is open for other requests from other clients. so is establishing you connection with google is blocking other users?
Me: I am not sure how google handles this. But in a typical socket communication, it is blocking to a minimal extent.
Wrong again. It has very little to do with Google specifically. A TCP server has a separate socket per accepted connection, and any rationally constructed TCP server handles them completely independently, either via multithreading, muliplexed/non-blocking I/O, or asynchronous I/O. They don't block each other.
Interviewer : How do you think this can be handled?
Me : I guess the process forks a new thread and creates a socket to handle my request. From now on, my socket endpoint of communication with google is this this child socket.
Threads are created, not 'forked'. Forking a process creates another process, not another thread. The socket isn't 'created' so much as accepted, and this would normally precede thread creation. It isn't a 'child' socket.
Interviewer: If that is the case, is this child socket’s port number different than the parent one.?
Me: The parent socket is listening at 80 for new requests from clients. The child must be listening at a different port number.
Wrong again. The accepted socket uses the same port number as the listening socket, and the accepted socket isn't 'listening' at all, it is receiving and sending.
Interviewer: how is your TCP connection maintained since your Dest-port number has changed. (That is the src-port number sent on google's packet) ?
Me: The dest-port as what I see as client is always 80. when request is sent back, it also comes from port 80. I guess the OS/the parent process sets the src port back to 80 before it sends back the post.
This question was designed to explore your previous wrong answer. Your continuation of your wrong answer is still wrong.
Interviewer : how long is your socket connection established with google?
Me : If I don’t make any requests for a period of time, the main thread closes its child socket and any subsequent requests from me will be like am a new client.
Wrong again. You don't know anything about threads at Google, let alone which thread closes the socket. Either end can close the connection at any time. Most probably the server end will beat you to it, but it isn't set in stone, and neither is which if any thread will do it.
Interviewer : No, google will not keep a dedicated child socket for you. It handles your request and discards/recycles the sockets right away.
Here the interviewer is wrong. He doesn't seem to have heard of HTTP keep-alive, or the fact that it is the default in HTTP 1.1.
Interviewer: Although google may have many servers to serve requests, each server can have only one parent socket opened at port 80. The number of clients to access google webpage must be exceeding larger than the number of servers they have. How is this usually handled?
Me : I am not sure how this is handled. I see the only way it could work is spawn a thread for each request it receives.
Here you haven't answered the question at all. He is fishing for an answer about load-balancers or round-robin DNS or something in front of all those servers. However his sentence "the number of clients to access google webpage must be exceeding larger than the number of servers they have" has already been answered by the existence of what you are both incorrectly calling 'child sockets'. Again, not a great question, unless you've reported it inaccurately.
Interviewer : Do you think the way Google handles is different from any bank website?
Me: At the TCP-IP socket level, it should be similar. At the request level, slightly different because a session is maintained to keep state between requests in Banks websites.
You almost got this one right. HTTP sessions to Google exist, as well as to bank websites. It isn't much of a question. He should be asking for facts, not your opinion.
Overall, (a) you failed the interview completely, and (b) you indulged in far too much guesswork. You should have simply stated 'I don't know' and let the interview proceed to things that you do know about.
For point #6, here is how I understand it: if both ends of an end to end connection were the same as that of another socket, there would indeed be no way to distinguish both socket, but if a single end is not the same as that of the other socket, then it's easy to distinguish both. So there is not need to turn destination port 80 (the default) forth and back, since the source ports differ.

With IPC how to tell client which port the server is listening on?

When using sockets for IPC, you can get the system to pick a random free port as described in this question here:
On localhost, how to pick a free port number?
There is a norm that you put the process ID in a ".pid" file so that you for example easy can find the apache process id and in this way kill it.
But what is the best practice way to exchange port number, when the OS picks a random port for you to listen on?
To inform about the port number you can use any other transport mechanism, which can be a file on the shared disk, pigeon mail, SMS, third-party server, dynamically updated DNS entry etc. The parties must have something common to share, then they can communicate. I omit port scanning here for the obvious reason.
There's one interesting aspect about not random ports but "floating" port number: if you don't want to keep the constant port but can choose the listening port within certain range, then you can use the algorithm for calculating actual port number based on date or day of week or other periodical or predictable information. This way the client knows where to look the server.
One more option is that during communication started on one port, the server and the client will agree where the server will wait for the client to have the next session.

Lan chat design

I'm in the process of trying to write a chat application and I have a few issues
that I trying to work out. The application is basically a chat application that works on a Lan. One client acts as the
host and other clients can connect to the host and publicly chat among themselves. I want also the option of a client starting
a private chat with an already connected client. So what is the best way for this to happen. For example should the request message (which
contains the ip address of client) route through the host and then if the requested client wants to connect , then they initiate the connection
using ip of the requesting client. Should this also be on a separate port number. Does it matter if your application uses a number of ports.
Or, when ever a client connects to a host, the host should send them a list of users with there ip addresses, and then the client can
attempt a connection with the other client for a private chat.
Hope this all makes sense. Any help would be appreciated
Thanks
If you are just interested in a quick-and-dirty chat facility that only needs to work over a LAN, I'd suggest having all clients send and receive broadcast UDP packets on a single well-known port number. Then no server is necessary at all, and thus no discovery is necessary either, and things are a lot simpler.
If you really want to go the client-server route, though, you should have your server (aka host) machine accept TCP connections on a single well-known port, and then have it use select() or poll() to multiplex the incoming TCP connections and forward any data that comes in from each incoming TCP socket to all of the others sockets. Clients can connect via TCP to the server at this well-known port, but the clients will have to have some way of knowing what IP address to connect to... either from having the user type in the IP address of the server, or by some discovery mechanism (broadcast UDP packets could be used to implement that). This way is a lot more work though.
I'm all for creating my own but depending on time constraints sometimes I look for alternatives like this I used it in a company I worked at before. It's really good. But if you decide to make your own you first have to map out a logic, structure, Database and so on before you even think about code..

How do I design a peer-to-peer app that avoids using listening sockets?

I've noticed that if you want to write an application that utilizes listening sockets, you need to create port forwarding rules on your router. If I want to connect two computers without either one of the the computers messing about with router settings, is there a way that I can get the two clients to connect to each other without either of them using listening sockets? There would need to be another server somewhere else telling them to connect but is it possible?
Some clarifications, and an answer:
Routers don't care about, or handle ports, that is the role of a firewall, which do port forwarding. The router/firewall combined device most of us have at home adds to the common misunderstanding.
Can you connect two computers without ServerSocket? No. You can use UDP (a stateless, connectionless communication protocol), but the role of a ServerSocket is to "listen" for incoming connection requests, and generate a Socket from those requests, which creates a communications channel between two endpoints. A Socket has both an InputStream and an OutputStream, so it can both read at write at either end. At that point (once the connection is made), the distinction between client/server is arbitrary, since a Socket is a two-way connection object, which allows both sides to send/receive.
What about proxying? Doesn't that allow connections between two computers without a ServerSocket? Well, no, because the server that's doing the proxying still has to be using a ServerSocket. Depending on what application you're trying to implement, this might be the way to go, or or might just add overhead. Even if there were "another server somewhere else telling them to connect", somebody has to listen for a connection request, which is the job of the ServerSocket.
If connections are happening over already open ports (most publicly accessible servers have ports <1024 not blocked by firewalls, but exceptions exist), then you shouldn't need to change firewall settings to get the connection to work.
So, to reiterate, the ONLY role of a ServerSocket (as far as your question is concerned) is to listen for incoming connection requests, and from those requests, create a Socket, which is a two-way communications channel between the two end points.
To answer the question, "How do I design a peer-to-peer app that avoids using listening sockets?", you don't. In the case of something like Vuze, the software acts as both client and server simultaneously, hence the term "peer", vs. "client" or "server" alone. In Vuze every client is a server, and every server (except for the tracker) is a client.
If you need a TCP connection between the 2 computers and both of them are behind routers (and you don't want to set up port forwarding) I think the only other possibility you have is having a third server somewhere that isn't behind a firewall running a ServerSocket and accepting connections between your 2 other computers and proxying communications between the 2. You can't establish a TCP Connection between the 2 without one listening to a socket and the other connecting to it.
Q: If I want to connect two computers without either one of the the
computers messing about with router settings, is there a way that I
can get the two clients to connect to each other
Yes: have the server listen on an open port :)

Multiple TCP/IP servers and sharing the same "well known port" ... somehow?

I apologize for the weird question wording... here's the design problem:
I am developing a server (on Linux using C++, FWIW) that provides a service to many instances of a client application running on consumer PCs.
I want the following:
1) All clients first identify themselves to a "gatekeeper" server application. Consider this a login procedure, with credentials like a user name and password being passed in. Call the gatekeeper program "gserver". (for gatekeeper.)
2) Once each client has been validated, it is then placed into a long term connection with one of several instances of a different server application running on the same physical server box bound to the same server address. Call any of these instances "wserver" (for "working" server.)
So, what the client sees is that a "gatekeeper" application gives it passworded access to one of several "working" servers running on the same box.
Here is the "real" challenge: we want to exclusively use a "well known" port number for the inbound server connections (like port 80 or 443, say.) Or, our own "well known" port.
We would prefer not to have to make the client talk to a second port on the server for the long term connection phase with wserver(n). The problem with this, of course, is that only one server process at a time can be bound to the same port and server address.
This implies that a connection made by the client with gserver must also fill the role of the long term connection. The only way I see to accomplish this is that gserver must, after login, act like a proxy and copy traffic between itself and the client to the particular wserver(n) that the client is bound to logically.
It would be ideal if a TCP/IP connection first made between client(n) and gserver could be somehow "transported" to another application on the same server, intact, and could then be sustained by one of the wserver(n) instances for the long term connection.
I know that web servers do something like this for spreading out server loads. "Load balancing". The main difference here is that the "balancing" is the allocation of a particular user to a particular wserver(n) instance. But I also have the impression that load balancing is a kind of proxying - which I am trying to avoid (since it complicates the architecture and adds overhead as well as a single point of failure.)
This is a conceptual and design question. Don't worry about source code examples, unless they are absolutely essential to get the ideas across. If we pin down an approach, I can code it up.
Thanks!
What you are looking for is file descriptor passing. See UNP 15.7. One well-known heavy user of this facility is postfix.
I developed such an application long time ago. Since multiple servers can't listen on the same port. What you need is to have gserver listening on the well-known port. Once connection is established, pass the connection to the other servers via an Unix socket. Once the connection is passed to other server, gserver is out of picture. It can die and the other server will be still serving the connection.
I dont' know if this applies to your design, but the usual solution (as implemmented by the xinetd daemon) is to fork() and then exec() the process. For example, xinetd may serve services like rlogin, rsh, tftp, telnet, etc. which are actually served by different programs. This will not be useful to you if your wservers are processes already running in the system.