Understanding socket basics

Understanding socket basics - sockets

I've been reading up on basic network programming, but am having a difficult time finding a straight-forward explanation for what exactly and socket is, and how it relates to either the OSI or TCP/IP stack.
Can someone explain to me what a socket is? Is it a programmer- or API-defined data structure, or is it a hardware device on a network card?
What layers of the mentioned network models deal with "raw" sockets? Transport layer? Network layer?
In terms of the data they pass between them, are socket text-based or binary?
Is there an alternative to sockets-based network programming? Or do all networked applications use some form of socket?
If I can get this much I should have a pretty clear understanding of everything else I'm reading. Thanks for any help!

Short answers:
Socket is an abstraction of an IP connection endpoint - so if you think of it as an API structure, you are not very far off. Please do read http://en.wikipedia.org/wiki/Internet_socket
Internet layer i.e. IP Protocol. In practice you usually use explicitly sockets that bind to a certain transport layer parameters (datagram/UDP or stream/TCP)
Sockets send data, in network byte order - whether it is text or binary, depends on the upper layer protocol.
Theoretically, probably yes - but in practice all IP traffic is done using 'sockets'

Socket is a software mechanism provided by the operating system. Like its name implies, you can think of it like an "electrical outlet" or some electrical connector, even though socket is not a physical device, but a software mechanism. In real world when you have two electrical connectors, you can connect them with a wire. In the same way in network programming you can create one socket on one computer and another socket on another computer and then connect those sockets. And when you write data to one of them, you receive it on the other one. There are also a few different kinds of sockets. For example if you are programming a server software, you want to have a listening socket which never sends or receives actual data but only listens for and accepts incoming connections and creates a new socket for each new connection.

A socket, in C parlance, is a data structure in kernel space, corresponding to one end-point of a UDP or TCP session (I am using session very loosely when talking about UDP). It's normally associated with one single port number on the local side and seldom more than one "well-known" number on either side of the session.
A "raw socket" is an end-point on, more or less, the physical transport. They're seldom used in applications programming, but sometimes used for various diagnostic things (traceroute, ping, possibly others) and may required elevated privileges to open.
Sockets are, in their nature, a binary octet-transport. It is not uncommon to treat sockets (TCP sockets, at least) as being text-based streams.
I have not yet seen a programming model that doesn't involve something like sockets, if you dig deep enough, but there have certainly been other models of doing networking. The "/net/" pseudo-filesystem, where opening "/net/127.0.0.0.1/tcp/80" (or "tcp/www") would give you a file handle where writes end up on a web server on localhost is but one.

Suppose your PC at home, and you have two browser windows open.
One looking at the facebook website, and the other at the Yahoo website.
The connection to facebook would be:
Your PC – IP1+port 30200 ——– facebook IP2 +port 80 (standard port)
The combination IP1+30200 = the socket on the client computer and IP2 + port 80 = destination socket on the facebook server.
The connection to Yahoo would be:
your PC – IP1+port 60401 ——–Yahoo IP3 +port 80 (standard port)
The combination IP1+60401 = the socket on the client computer andIP3 + port 80 = destination socket on the Yahoo server.

Related

Difference between "tcp/socket" vs "tcp/ip"

What is the difference between a "tcp/socket" and "tcp/ip" connexion?
When you say that you use "tcp/ip", do you necessarily use a "tcp/socket"?
Thanks!

A socket is a general communication means provided by your
operating system.
There are many kinds of them, for very distinct purposes
(not only networking).
I guess that when you think about tcp/socket, you mean a
socket dedicated to the TCP protocol.
TCP/IP can be seen as two different things, depending on the context.
It can be the TCP/IP network stack as a whole: not only the TCP and IP
specific protocols but the set of protocols (and implementations) we
find around these.
Of course, the other way to see TCP/IP is to consider only the TCP
transport protocol relying on the IP network protocol.
The various operating systems implement many protocols
in the TCP/IP stack.
To use them, a programmer asks his/her operating system
a specific resource: a socket.
It's difficult to say more with few words.
Some books or online documentation could help go further.

I think you missing something in the question. Anyway in short...
TCP/IP is basically name given to protocol we (networking devices) follow it forms the fundamental of todays internet. It involves agreement between two devices how the want ro communicate eg. Which segment of a frame has what information as in the end its all just 10...
There are 5 layers (some argue 4) in this model one layer is Network Layer right at the middle of all and it generally uses IPv4.And just above this is our Transport Layer which may use TCP or UDP as protocol depending on service you want. So thats the summary of TCP/ IP as the most used set of protocols of all.
When connecting to a remote server your browser needs to know what kind of service he is about to get from that server eg. a video mail or file transfer or just a http page. Thats when a TCP/Socket comes into picture where there is a port no assigned for every service. Eg port 443 is for https and so on. All you need to do is open a socket connection over that port number on that machine
Remember if a particular port of a server is not in LISTEN mode you cannot connect to that application via that port
Eg. If a server serves its webpage it might not allow you to connect its port responsible for FTP.

When 2 computer interact over a network connection then first they have to make a socket connection?

Please answers in yes or no. This will solve my doubt.
1. Is a post/get request sent from computer-1 to Computer-2 then first they have to make a socket connection?
2. When 2 computers connect with ssh then first they have to make a socket connection , then only then can talk to each other ?

The examples you give involve network connections and sockets are a common abstraction used when communicating over modern computer networks; however, other abstractions could be used. This is what Damien_The_Unbeliever is saying in the comments. For example, you could ask whether for loops are the only way to iterate over an array. The answer is the same: for loops are a common abstraction to loop over elements of an array, but there are other equivalent (in a machine-code sense) methods of doing so using other abstractions.
More fundamentally, computers can communicate with each other without using networks at all. You could have computers communicate over an interface consisting of webcams and monitors; sending is accomplished by putting something on the screen and receiving is accomplished by receiving the video feed. You could do the same with microphones and speakers. You could do the same with robotic arms, keyboards and mice. Two computers can communicate with each other using a human courier; my work and home computers do it regularly! Computers could write letters and mail them, deliver them or use carrier pigeons to send them to other computers designed to accept information in those formats.

Please answers in yes or no. This will solve my doubt.
[ok]
Is a post/get request sent from computer-1 to Computer-2 then first they have to make a socket connection?
In this case, Yes, but if request is going from browser then it do for you.You can see thr url for get and post have the port number in them. If not mentioned the default port is 80, in general. For example If you use WebSphere Application Server, the default port is 9081 or 80 if IBM HTTP Server is configured.
When 2 computers connect with ssh then first they have to make a socket connection , then only then can talk to each other ?
In this Case, again Yes, the port is 22 in ssh

UDP for multiplayer game

I have no experience with sockets nor multiplayer programming.
I need to code a multiplayer mode for a game I made in c++. It's a puzzle game but the game mode will not be turn-based, it's more like cooperative.
I decided to use UDP, so I've read some tutorials, and all the samples I find decribes how to create a client that sends data and a server that receives it.
My game will be played by two players, and both will send and receive data to/from the other.
Do I need to code a client and a server?
Should I use the same socket to send and receive?
Should I send and receive data in the same port?
Thanks, I'm kind of lost.

Read how the masters did it:
http://www.bluesnews.com/abrash/chap70.shtml
Read the code:
git clone git://quake.git.sourceforge.net/gitroot/quake/quake
Open one UDP socket and use sendto and recvfrom. The following file contains the functions for the network client.
quake/libs/net/nc/net_udp.c
UDP_OpenSocket calls socket (PF_INET, SOCK_DGRAM, IPPROTO_UDP)
NET_SendPacket calls sendto
NET_GetPacket calls recvfrom

Do I need to code a client and a server?
It depends. For a two player game, with both computers on the same LAN, or both on the open Internet, you could simply have the two computers send packets to each other directly.
On the other hand, if you want your game to work across the Internet, when one or both players are behind a NAT and/or firewall, then you have the problem that the NAT and/or firewall will probably filter out the other player's incoming UDP packets, unless the local player goes to the trouble of setting up port-forwarding in their firewall... something that many users are not willing (or able) to do. In that case, you might be better off running a public server that both clients can connect to, which forwards data from one client to another. (You might also consider using TCP instead of UDP in that case, at least as a fallback, since TCP streams are in general likely to have fewer issues with firewalls than UDP packets)
Should I use the same socket to send and receive?
Should I send and receive data in the same port?
You don't have to, but you might as well -- there's no downside to using just a single socket and a single port, and it will simplify your code a bit.

Note that this answer is all about using UDP sockets. If you change your mind to use TCP sockets, it will almost all be irrelevant.
Do I need to code a client and a server?
Since you've chosen to to use UDP (a fair choice if your data isn't really important and benefits more from lower latency than reliable communication), you don't have much of a choice here: a "server" is a piece of code for receiving packets from the network, and your "client" is for sending packets into the network. UDP doesn't provide any mechanism for the server to communicate to the client (unlike TCP which establishes a 2 way socket). In this case, if you want to have two way communication between your two hosts, they'll each need server and client code.
Now, you could choose to use UDP broadcasts, where both clients listen and send on the broadcast address (usually 192.168.1.255 for home networks, but it can be anything and is configurable). This is slightly more complex to code for, but it would eliminate the need for client/server configuration and may be seen as more plug 'n play for your users. However, note that this will not work over the Internet.
Alternatively, you can create a hybrid method where hosts are discovered by broadcasting and listening for broadcasts, but then once the hosts are chosen you use host to host unicast sockets. You could provide fallback to manually specify network settings (remote host/port for each) so that it can work over the Internet.
Finally, you could provide a true "server" role that all clients connect to. The server would then know which clients connected to it and would in turn try to connect back to them. This is a server at a higher level, not at the socket level. Both hosts still need to have packet sending (client) and receiving (server) code.
Should I use the same socket to send and receive?
Well, since you're using UDP, you don't really have a choice. UDP doesn't establish any kind of persistent connection that they can communicate back and forth over. See the above point for more details.
Should I send and receive data in the same port?
In light of the above question, your question may be better phrased "should each host listen on the same port?". I think that would certainly make your coding easier, but it doesn't have to. If you don't and you opt for the 3rd option of the first point, you'll need a "connect back to me on this port" datafield in the "client's" first message to the server.

Sockets VS WinPcap

Does anyone know why should I use Winpcap and not just .Net sockets to sniff packets on my local pc?
TY

Sockets (.NET, Winsock, etc.) normally collect at layer 7, the Application layer. That is, whatever is sent by the sender is what is received by the receiver. All of the various headers that are added automatically on the sending side are stripped off by the time the receiver reads the data from the socket.
It is possible to configure a socket to be a raw socket, in which case, you can see all of the headers down to layer 3, the Network layer. Further still, you can put the raw socket in promiscuous mode, which allows you to see all of the traffic on the network, not just the packets destined for your machine. But even this is limited. For example, when you configure the raw socket, you specify the protocol type to use, e.g., IP, ICMP, etc. This limits the socket to "seeing" packets that adhere to that protocol. I have been unable to figure out how to make the socket see all packets at layer 3 regardless of protocol.
Winpcap operates as a device driver at layer 2, the Data Link layer. In this case, you see literally all of the packets on the network with full headers down to layer 2. Winpcap also offers filtering capability so you can narrow down the packets that are reported to you based on whatever criteria you provide.
As far as choosing between them, it really boils down to the requirements of your specific task. If you are trying to implement any kind of realistic network analysis capability, you'll be hardpressed to do that with just sockets. Winpcap makes more sense in that case. However, if you are only interested in IP packets, for example, then sockets will work fine for that.

As far as I understanf .Net sockets are an IPC to communicate between 2 processes. While winpcap is a library that help you to access the data link layer an sniff pacquets going through your network hardware (or virtual) devices on your machine. Data link layer allow to get the data on any socket (.Net or not) created on your system.

Can socket connections be multiplexed?

Is it possible to multiplex sa ocket connection?
I need to establish multiple connections to yahoo messenger and i am looking for a way to do this efficiently without having to hold a socket open for each client connection.
so far i have to use one socket for each client and this does not scale well above 50,000 connections.
oh, my solution is for a TELCO, so i need to at least hit 250,000 to 500,000 connections
i'm planing to bind multiple IP addresses to a single NIC to beat the 65k port restriction per IP address.
Please i would any help, insight i can get.
**most of my other questions on this site have gone un-answered :) **
Thanks

This is an interesting question about scaling in a serious situation.
You are essentially asking, "How do I establish N connections to an internet service, where N is >= 250,000".
The only way to do this effectively and efficiently is to cluster. You cannot do this on a single host, so you will need to be able to fragment and partition your client base into a number of different servers, so that each is only handling a subset.
The idea would be for a single server to hold open as few connections as possible (spreading out the connectivity evenly) while holding enough connections to make whatever service you're hosting viable by keeping inter-server communication to a minimum level. This will mean that any two connections that are related (such as two accounts that talk to each other a lot) will have to be on the same host.
You will need servers and network infrastructure that can handle this. You will need a subnet of ip addresses, each server will have to have stateless communication with the internet (i.e. your router will not be doing any NAT in order to not have to track 250,000+ connections).
You will have to talk to AOL. There is no way that AOL will be able to handle this level of connectivity without considering cutting your connection off. Any service of this scale would have to be negotiated with AOL so both you and they would be able to handle the connectivity.
There are i/o multiplexing technologies that you should investigate. Kqueue and epoll come to mind.
In order to write this massively concurrent and teleco grade solution, I would recommend investigating erlang. Erlang is designed for situations such as these (multi-server, massively-multi-client, massively-multithreaded telecommunications grade software). It is currently used for running Ericsson telephone exchanges.

While you can listen on a socket for multiple incoming connection requests, when the connection is established, it connects a unique port on the server to a unique port on the client. In order to multiplex a connection, you need to control both ends of the pipe and have a protocol that allows you to switch contexts from one virtual connection to another or use a stateless protocol that doesn't care about the client's identity. In the former case you'd need to implement it in the application layer so that you could reuse existing connections. In the latter case you could get by using a proxy that keeps track of which server response goes to which client. Since you're connecting to Yahoo Messenger, I don't think you'll be able to do this since it requires an authenticated connection and it assumes that each connection corresponds to a single user.

You can only multiplex multiple connections over a single socket if the other end supports such an operation.
In other words it's a function protocol - sockets don't have any native support for it.
I doubt yahoo messenger protocol has any support for it.
An alternative (to multiple IPs on a single NIC) is to design your own multiplexing protocol and have satellite servers that convert from the multiplex protocol to the yahoo protocol.

I'll trow in another approach you could consider (depending on how desperate you are).
Note that operating system TCP/IP implementations need to be general purpose, but you are only interested in a very specific use-case. So it might make sense to implement a cut-down version of TCP/IP (which only handles your use-case, but does that very well) in your application code.
For example, if you are using Linux, you could route a couple of IP addresses to a tun interface and have your application handle the IP packets for that tun interface. That way you can implement TCP/IP (optimised for your use-case) entirely in your application and avoid any operating system restriction on the number of open connections.
Of course, it's quite a bit of work doing the TCP/IP yourself, but it really depends on how desperate you are - i.e. how much hardware can you afford to throw at the problem.

500,000 arbitrary yahoo messenger connections - is your telco doing this on behalf of Yahoo? It seems like whatever solution has been in place for many years now should be scalable with the help of Moore's Law - and as far as I know all the IM clients have been pretty effective for a long time, and there's no pressing increase in demand that I can think of.
Why isn't this a reasonable problem to address with hardware plus traditional solutions?

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse