how can firewall/iptables check incoming tcp traffic of already bound ports? - sockets

As far as i know only one process can be bound to a port of the same protocol, and in order to read incoming information to a port a socket must be bound to a that relevant port.
is there a way of sharing a socket with another process or something like that?

is there a way of sharing a socket with another process or something like that?
Sharing a socket and thus the port between two processes is possible (like after fork) but this is probably not what you want for data analysis, since if one process reads the data the other does not get them anymore.
how can firewall/iptables check incoming tcp traffic of already bound ports?
Packet filter like iptables work inside the kernel and get the data before they gets send to the socket. It does not even matter if there is socket bound to this specific port at all. Unless the packet filter denies the data they get forwarded unchanged to the socket (if there is any).
Passive IDS like snort or tools like tcpdump get the raw packets and here it also does not matter if there is a socket at all. They can only read the packets, i.e. not modify or block.
Application level firewalls or (reverse) proxies have their own socket and receive the data there (directly or redirected by the packet filter). They can then analyse the data and will explicitly forward the data (maybe after modification) to the original application.

Related

Difference between port number and socket

I started reading UNIX network programming by W. Richard Stevens and I am very confused between a port and a socket . when I read on internet it said that socket is an endpoint for a connection and for port number it was written that , IP address and port no form a unique pair .
So now my question is that :
(1) What is the difference between these two ?
(2)How are sockets and ports internally manipulated. Are sockets a file ?
(3) How is data sent when we send it using an application ?
(4) If sockets are there then why do we use port numbers ?
Sorry for my English.. Thanks in advance for the reply.
(1) What is the difference between these two ?
A computer running IP networking always has a fixed number of ports -- 65535 TCP ports and 65535 UDP ports. A network packet's header contains a 16-bit unsigned-short field in it specifying which of those ports the packet should be delivered to.
Sockets, on the other hand, are demand-allocated by each program. A socket serves as a handle/interface between the program and the OS's networking stack, and is used to build and specify a context for a particular networking task. A socket may or may not be bound to a port, and it's also possible (and common) to have more than one socket bound to a particular port at the same time.
(2)How are sockets and ports internally manipulated. Are sockets a
file ?
That's totally up to the OS; and different OS's do it different ways. It's unclear what you mean by "a file" in this question, but in general sockets do not have anything to do with the filesystem. On the other hand, one feature of Unix-style OS's is that socket descriptors are also usable in the much same way that filesystem file descriptors are -- i.e. you can pass them to read()/write()/select(), etc and get useful results. Other OS's, such as Windows, do not support that feature and for them you must use a completely separate set of function calls for sockets vs files.
(3) How is data sent when we send it using an application ?
The application calls the send() function (or a similar function such as sendto()), passes in the relevant socket descriptor along with a pointer to the data it wants to send, and then it is up to the network stack to copy that data into a packet and deliver it to the appropriate networking device for transmission.
(4) If sockets are there then why do we use port numbers ?
Because you need a way to communicate with particular programs on other computers, and computer A has no way of knowing what sockets are present (if any) on computer B. But port numbers are fixed, so it is possible for programmers to use them as a rendezvous point for communication -- for example, your web browser knows that a web server is almost certain to be listening for incoming HTTP requests on port 80 whenever the server is running, so it can send its requests to port 80 with a reasonable expectation of getting a useful response back. If it had to specify a socket as a target instead, what would it specify? The server's socket numbers are arbitrary and likely to be different every time the server runs.
1) What is the difference between these two ?
(2)How are sockets and ports internally manipulated. Are sockets a file ?
A socket is (IP+Port):
A socket is like a telephone (i.e. end to end device for communication)
IP is like your telephone number (i.e. address of your socket)
Port is like the person you want to talk to (i.e. the service you want to order from that address)
A socket is part of a process. A process in linux is a file.
(3) How is data sent when we send it using an application ?
Data is sent by converting it to bytes. There is little/big endian problem regarding the ordering in bytes so you have to take this into consideration when coding.
(4) If sockets are there then why do we use port numbers ?
A socket is (address + port) that means the person you want to talk to (port) can be reachable from many telephone numbers (IPs) and thus from many sockets (that does not mean that the person on one telephone number will reply to you the same as the one in the other telephone number because his job here/there may be different).

At the level of IP, does "leave the connection open" have a specific technical meaning - such as intermediate gateways storing an IP map entry?

I am an experienced socket-level programmer in C++, but I do not understand what happens at the IP network level when a socket connection is left open (vs. being closed by calling the close function on the socket from within code).
I have studied the IP header and tried to understand if leaving a socket open has any implications at the IP level.
At the TCP level, leaving a socket open could make sense to me, because perhaps that means the "sequence number" field in the TCP header continues to increment. However, that would be a purely endpoint-based implementation, and therefore could not cut down on transit time for TCP packets. It is my understanding that leaving a connection open generally means that transit time between endpoints across the internet is decreased for packets.
The question is, does it mean anything at the IP level to leave a socket connection open?
The best guess I have is that if a socket connection remains open, that intervening gateways along the complete IP network path will attempt to leave an entry in their mapping table so that the next hop can be executed immediately, without needing to do a broadcast to all connected gateways in order to determine the next hop.
(Perhaps the overhead of DNS lookup is also avoided in this fashion.)
Am I correct in guessing that "leaving a connection open" corresponds to map entries remaining in place on intermediate IP gateways (which speeds up packet transfer)?
Direct answer: No.
Your question suggests that you don't fully understand the purpose of TCP, which is to establish a data stream between two hosts. Keeping that in mind, the purpose of leaving a connection open should be obvious: if you close the connection, the stream will end.
The status of a TCP connection is not visible on the IP level; it's only of relevance to TCP. With the exception of NAT gateways, intermediate hosts do not generally keep track of the status of TCP connections passing through them. (In many cases, it'd be impossible for them to do so -- large routers have far more connections running through them than they could possibly track.)
The best guess I have is that if a socket connection remains open, that intervening gateways along the complete IP network path will attempt to leave an entry in their mapping table so that the next hop can be executed immediately, without needing to do a broadcast to all connected gateways in order to determine the next hop.
This guess is incorrect. A router will have some sort of algorithm for picking a route based on the destination IP, based on a set of routing tables it keeps internally. Read up on BGP for details on how this is determined on large routers; on smaller routers, the routing table is typically defined by the administrator.
First of all, let's clear up a misconception:
that intervening gateways along the complete IP network path will attempt to leave an entry in their mapping table so that the next hop can be executed immediately, without needing to do a broadcast to all connected gateways in order to determine the next hop.
Routers never "broadcast to all connected gateways" in order to determine the next hop. If a packet arrives and the router does not already know how to route it, the packet is simply dropped (possibly with an ICMP error message being sent back to the source). The job of the routing protocols that run on routers is to prepopulate the router's routing table with routes learned from peers so that they are then prepared to receive packets and route them.
Also, "the complete IP network path" is not well-defined. The network path can change at any time as links fail on the network or new links become available. It can even change from one packet to the next in the absence of routing changes due to load balancing.
Back to your question: no, whether or not a socket is closed has no impact on IP. IP is stateless in the sense that every packet is self-contained and routed independently.
Whether or not a socket is closed does make a difference to TCP, but, as you note, that concerns only the two nodes at the endpoints of the connection.
The impact of "leaving a connection open" on speed, such that it is, is that establishing a connection in TCP requires a round-trip. But more to the point, a connection also has semantic meaning to most protocols running on TCP. Two bits of data sent on the same connection are related in a way that two bits of data sent on different connections are not.

Socket programming - API doubt

There was this question posted in class today about API design in socket programming.
Why are listen() and accept() provided as different functions and not merged into one function?
Now as far as I know, listen marks a connected socket as ready to accept connections and sets a max bound on the queue of incoming connections. If accept and listen are merged, can such a queue not be maintained?
Or is there some other explanation?
Thanks in advance.
listen() means "start listening for clients"
accept() means "accept a client, blocking until one connects if necessary"
It makes sense to separate these two, because if they were merged, then the single merged function would block. This would cause problems for non-blocking I/O programs.
For example, lets take a typical server that wants to listen for new client connections, but also monitor existing client connections for new messages. A server like this typically uses a non-blocking I/O model so that it is not blocked on any one particular socket. So it needs a way to "start listening" on the server socket without blocking on it. Once listening on the server socket has been initiated, the server socket is added to the bucket of sockets being monitored via select() (called poll() on some systems). The select() call would indicate when there is a client pending on the server socket. Then the program can then call accept() without fear of blocking on that socket.
listen(2) makes given TCP socket a server socket, i.e. creates a queue for accepting connection requests from the clients. Only the listening side port, and possibly IP address, are bound (thus you need to call bind(2) before listen(2)). accept(2) then actually takes such connection request from that queue and turns it into a connected socket (four parts required for two-way communication - source IP address, source port number, destination IP address, and destination port number - are assigned). listen(2) is called only once, while accept(2) is usually called multiple times.
Under the hood, bind assigns an address and a port to a socket descriptor. It means the port is now reserved for that socket, and therefore the system won't be able to assign the same port to another application (an exception exists, but I won't go into details here). It's also a one-time-per-socket operation.
Then listen is responsible for establishing the number of connections that can be queued for a given socket descriptor, and indicate that you're now willing to receive connections.
On the other hand, accept is used to dequeue the first connection from the queue of pending connections, and create a new socket to handle further communication through it. It may be called multiple times, and generally is. By default, this operation is blocking if there are no connections in the queue.
Now suppose you want to use an async IO mechanism (like epoll, poll, kqueue, select, etc). If listen and accept were a single API, how would you indicate that a given socket is willing to receive connections? The async mechanism needs to know you wish to handle this type of event as well.
With quite different semantics, it makes sense to have them apart.

why cannot we use process id insted of taking the port we are binding

why cannot we use process id insted of taking the port we are binding in socket programming.
in socket programming we create socket and get a socket descriptor and we bind to a specific port .for multiple connection why are we not using the process id as all the connection are also a process returning the processs id?
It's an interesting idea, but I think it would raise a few problems:
How would you know what process ID you wanted to connect to?
What if you wanted to listen on more than one "port" inside the same process? You only have one process ID.
IPv4 and IPV6 allocate 16 bits for port IDs, but process IDs usually are 32-bit (or bigger) values, so they wouldn't fit
There are many programs that don't have a networking aspect, and don't want one. Would automatically instantiating a network communications path to them be a potential security problem?
One trick you can do (especially with UDP multicast or broadcast) is have several programs listen on the same port (via SO_REUSEPORT), so that when anyone sends out a UDP packet to that port, all of the programs receive it. That trick would be difficult or impossible if programs had to use their (unique) process ID numbers as port numbers.
First, multiple connections can exist per process. Second, socket API is does not depend on any OS process API.
Because TCP has port numbers in the specification but it doesn't have process IDs.
Why would you want to use a processID that you can't control when you can control the port number? How would a process listen on multiple ports?

Sockets Asyn Connection

I am new to Async Socket Connection. Can you please explain. How does this technology work.
There's an existing application (server) which requires socket connections to transmit data back and forward. I already create my application (.NET) but the Server application doesn't seem to understand the XML data that I am sending. My documentation is giving me two ports one to Send and another one to Receive.
I need to be sure that I understand how this works.
I got the IP addresses and also the two Ports to be used.
A socket is the most "raw" way you can use to send byte-level TCP and UDP packets across a network.
For example, your browser uses a socket TCP connection to connect to the StackOverflow web server on port 80. Your browser and the server exchange commands and data according to an agreed-on structure/protocol (in this case, HTTP). An asynchronous socket is no different than a synchronous socket except that is does not block the thread that's using it.
This is really not the most ideal way to work (check and see if your server/vendor application supports SOAP/Web Services, etc), but if this is really the only way, there could be a number of reasons why it's failing. To name a few...
Not actually getting connected or sending data. Run a test using WinsockTool (http://www.isatools.org/tools/winsocktool.msi) and simulate your client first to make sure the server is working as expected.
Encoding incorrect - You're sending raw bytes across the network... Make sure you're using the correct encoding to convert your XML into bytes (ASCII, UTF8, etc).
Buffer Length - Your sending buffer (the amount of data you can transmit in one shot) may be too small or the server may expect a content of a certain length, and your XML could be getting truncated.
let's break a misconception... sockets are FULL-DUPLEX: you connect to a server using one port, then you can send AND receive data through the same socket, no need for 2 port numbers. (actually, there is a port assigned for receiving data, but it is: 1. assigned automatically when creating the socket (unless told so) and 2. of no use in the function calls to receive data)
so you tell us that your documentation give you 2 port numbers... i assume that the "server" is an already existing in-house application, and you are trying to talk to it. if the doc lists 2 ports, then you will need 2 sockets: one for sending and another one for receiving. now i would suggest you first use a synchronous socket before trying the async way: a synchronous socket is less error-prone for a first test.
(by the way, let's break another misconception: if well coded, once a server listen on a port, it can receive any number of connection through the same port number, no need to open 2 listening ports to accept 2 connections... sorry for the re-alignment, but i've seen those 2 errors committed enough time, it gives me a urge to kill)