Understanding network byte order (TCP) - sockets

Network byte order used e.g. by TCP is big endian. This doesn't effect the actual payload the user is sending over the network, does it? This is in regards to the e.g. 16-bit port number and a 32-bit IPv4 address, which TCP exchanges itself and thus requires the participants to agree on endianness.
In other words: Assuming 2 participants with same endianness machines and a simple setup with TCP sockets, there is no need to convert anything in terms of data/payload, right?
I'm just a little confused as there is a lot of talk regarding endianness conversion with regards to network byte order. For example, in IBM's docs (IBM Docs Network Byte Order) it says:
The TCP/IP standard network byte order is big-endian. In order to participate in a TCP/IP network, little-endian systems usually bear the burden of conversion to network byte order.
To me this sounds like conversion depends on network byte order when in fact the only thing that matter are the endianness on the participating machines, doesn't it.

This doesn't effect the actual payload the user is sending over the network, does it?
TCP and UDP just transport bytes without any inherent meaning. How the payloads needs to be interpreted and if endianness is relevant is part of the application layer, i.e. depends on the application protocol spoken.

Related

C# BeginSend/BeginReceive sometimes send or receive data attatched [duplicate]

I have two apps sending tcp packages, both written in python 2. When client sends tcp packets to server too fast, the packets get concatenated. Is there a way to make python recover only last sent package from socket? I will be sending files with it, so I cannot just use some character as packet terminator, because I don't know the content of the file.
TCP uses packets for transmission, but it is not exposed to the application. Instead, the TCP layer may decide how to break the data into packets, even fragments, and how to deliver them. Often, this happens because of the unterlying network topology.
From an application point of view, you should consider a TCP connection as a stream of octets, i.e. your data unit is the byte, not a packet.
If you want to transmit "packets", use a datagram-oriented protocol such as UDP (but beware, there are size limits for such packets, and with UDP you need to take care of retransmissions yourself), or wrap them manually. For example, you could always send the packet length first, then the payload, over TCP. On the other side, read the size first, then you know how many bytes need to follow (beware, you may need to read more than once to get everything, because of fragmentation). Here, TCP will take care of in-order delivery and retransmission, so this is easier.
TCP is a streaming protocol, which doesn't expose individual packets. While reading from stream and getting packets might work in some configurations, it will break with even minor changes to operating system or networking hardware involved.
To resolve the issue, use a higher-level protocol to mark file boundaries. For example, you can prefix the file with its length in octets (bytes). Or, you can switch to a protocol that already handles this kind of stuff, like http.
First you need to know if the packet is combined before it is sent or after. Use wireshark to check it the sender is sending one packet or two. If it is sending one, then your fix is to call flush() after each write. I do not know the answer if the receiver is combining packets after receiving them.
You could change what you are sending. You could send bytes sent, followed by the bytes. Then the other side would know how many bytes to read.
Normally, TCP_NODELAY prevents that. But there are very few situations where you need to switch that on. One of the few valid ones are telnet style applications.
What you need is a protocol on top of the tcp connection. Think of the TCP connection as a pipe. You put things in one end of the pipe and get them out of the other. You cannot just send a file through this without both ends being coordinated. You have recognised you don't know how big it is and where it ends. This is your problem. Protocols take care of this. You don't have a protocol and so what you're writing is never going to be robust.
You say you don't know the length. Get the length of the file and transmit that in a header, followed by the number of bytes.
For example, if the header is a 64bits which is the length, then when you receive your header at the server end, you read the 64bit number as the length and then keep reading until the end of the file which should be the length.
Of course, this is extremely simplistic but that's the basics of it.
In fact, you don't have to design your own protocol. You could go to the internet and use an existing protocol. Such as HTTP.

Does raw sockets reassemble the packet?

I implement a simple tunneling and encryption of outgoing IP packets, i.e. each packet+IP header is encrypted and added with a new IP header.
For this purpose I use raw sockets in the sender and the receiver.
I just try to figure out if fragmentation of the outgoing packets can result in breaking the capability to decrypt them again.
Do raw sockets provide the assembled packet or do I need to implement de-fragmentation by myself ?
Assuming that you are referring to RAW sockets of the Berkeley Sockets API (aka BSD Sockets),
the answer is:
No, they do not combine fragments of fragmented IP packets. You will receive the IP packets, including IP header, just as they did arrive at your network interface.
Please note that there exist various implementations of BSD sockets in different operation systems. You didn't say for which system(s) you are developing that code. And despite the fact that the POSIX standard based its network API on BSD sockets, POSIX doesn't specify RAW sockets at all, so a POSIX conforming operation system doesn't even have to support RAW sockets.
And despite the fact many systems have adopted the BSD API, among them Linux/Android, FreeBSD, macOS/iOS, and even Windows, there are some important differences in their implementations. E.g. they support different socket options, their socket options behave in different way, or they support different extensions. As an example for differences in socket options, see here. So your system may theoretically have an option you can set to get reassembled packets. This would not be portable but RAW sockets themselves are only limited portable to begin with.
This is OS specific, but generally it depends on how you read them. Take a look at a couple of linux docs on POSIX sockets:
packet
socket
recvfrom
In particular you use a SOCK_RAW then recvfrom will not always return full packets. See the following quotes:
If a message is too long to fit in the supplied buffer,
excess bytes may be discarded depending on the type of socket the
message is received from.
If len is too small to fit an entire packet, the excess bytes will be returned from the next read.
The receive calls
normally return any data available, up to the requested amount,
rather than waiting for receipt of the full amount requested.
To your question:
Do raw sockets provide the assembled packet or do I need to implement de-fragmentation by myself ?
They don't, you need to de-fragment yourself. If the socket isn't flushed, or fragmentation occurs the call will return any data available, possibly only partial packets the expectation is that you restructure them.

In which layer big/little endian conversion is done?

Suppose a packet is sent by UDP. I'm wondering in which layer the big/little endian conversion of payload is done.
Why do you think this is done? The transport protocol has no notion of your data, it transmits bytes. You, as an application protocol designer, will have to decide and ensure to send your data in a certain endianness.
UDP doesn't know that four successive bytes somewhere in a packet form a 32-bit integer, for example. They might as well form four 1-byte values, for example UTF-8 code points. Do you want UDP to randomly invert your strings?
See also Sending UDP packets in the correct Endianness.
'Endian conversion' of a protocol is usually done in the layer that handles the protocol.
For example: NTP program/library itself creates a NTP-message in network byte order before passing the message to UDP-layer (=sending). UDP layer will create UDP header in network byte order, but does not change the payload (=NTP message).
Lower layer protocols doesn't know and care what kind of upper layer protocols are carried in the payload. There are few exceptions or borderline cases, like checksum offloading in NIC.

Difference between port number and socket

I started reading UNIX network programming by W. Richard Stevens and I am very confused between a port and a socket . when I read on internet it said that socket is an endpoint for a connection and for port number it was written that , IP address and port no form a unique pair .
So now my question is that :
(1) What is the difference between these two ?
(2)How are sockets and ports internally manipulated. Are sockets a file ?
(3) How is data sent when we send it using an application ?
(4) If sockets are there then why do we use port numbers ?
Sorry for my English.. Thanks in advance for the reply.
(1) What is the difference between these two ?
A computer running IP networking always has a fixed number of ports -- 65535 TCP ports and 65535 UDP ports. A network packet's header contains a 16-bit unsigned-short field in it specifying which of those ports the packet should be delivered to.
Sockets, on the other hand, are demand-allocated by each program. A socket serves as a handle/interface between the program and the OS's networking stack, and is used to build and specify a context for a particular networking task. A socket may or may not be bound to a port, and it's also possible (and common) to have more than one socket bound to a particular port at the same time.
(2)How are sockets and ports internally manipulated. Are sockets a
file ?
That's totally up to the OS; and different OS's do it different ways. It's unclear what you mean by "a file" in this question, but in general sockets do not have anything to do with the filesystem. On the other hand, one feature of Unix-style OS's is that socket descriptors are also usable in the much same way that filesystem file descriptors are -- i.e. you can pass them to read()/write()/select(), etc and get useful results. Other OS's, such as Windows, do not support that feature and for them you must use a completely separate set of function calls for sockets vs files.
(3) How is data sent when we send it using an application ?
The application calls the send() function (or a similar function such as sendto()), passes in the relevant socket descriptor along with a pointer to the data it wants to send, and then it is up to the network stack to copy that data into a packet and deliver it to the appropriate networking device for transmission.
(4) If sockets are there then why do we use port numbers ?
Because you need a way to communicate with particular programs on other computers, and computer A has no way of knowing what sockets are present (if any) on computer B. But port numbers are fixed, so it is possible for programmers to use them as a rendezvous point for communication -- for example, your web browser knows that a web server is almost certain to be listening for incoming HTTP requests on port 80 whenever the server is running, so it can send its requests to port 80 with a reasonable expectation of getting a useful response back. If it had to specify a socket as a target instead, what would it specify? The server's socket numbers are arbitrary and likely to be different every time the server runs.
1) What is the difference between these two ?
(2)How are sockets and ports internally manipulated. Are sockets a file ?
A socket is (IP+Port):
A socket is like a telephone (i.e. end to end device for communication)
IP is like your telephone number (i.e. address of your socket)
Port is like the person you want to talk to (i.e. the service you want to order from that address)
A socket is part of a process. A process in linux is a file.
(3) How is data sent when we send it using an application ?
Data is sent by converting it to bytes. There is little/big endian problem regarding the ordering in bytes so you have to take this into consideration when coding.
(4) If sockets are there then why do we use port numbers ?
A socket is (address + port) that means the person you want to talk to (port) can be reachable from many telephone numbers (IPs) and thus from many sockets (that does not mean that the person on one telephone number will reply to you the same as the one in the other telephone number because his job here/there may be different).

Does UDP allow repacketization?

I know that for TCP you can have for example Nagle's Algorithm enabled. However, can you have something similar for UDP?
Practical Question(assume UDP socket):
If I call send() two times in a short period of time with 1 byte of data in each send() call. Is it possible that the transport layer decides to send only 1 UPD packet with the 1 byte + 1 byte = 2 bytes of data?
Thanks in advance!
No. UDP datagrams are delivered intact exactly as sent, or not at all.
Not according to the RFC (RFC 768). Above IP facilities themselves, UDP really only provides, as extras, port-based routing and a little bit of extra detection for corruption or misrouting.
That means there's no facility to combine datagrams. In fact, since it's meant to be transaction oriented, I would say that combining two transactions into one may well be a bad idea in terms of keeping these transaction disparate.
Otherwise, you would need a layer above UDP which could figure out how to extract these transactions from a datagram. At the moment, that's not necessary since the datagram is the transaction.
As added support (though not, of course, definitive) for this contention, see the UDP wikipedia page:
Datagrams – Packets are sent individually and are checked for integrity only if they arrive. Packets have definite boundaries which are honored upon receipt, meaning a read operation at the receiver socket will yield an entire message as it was originally sent.
However, the best support for it comes from one of its clients. UDP was specially engineered for TFTP (among other things) and that protocol breaks down if you cannot distinguish a transaction.
Specifically, one of the TFTP transaction types is the data transaction which consists of an opcode, block number and up to 512 bytes of data. Without a length indication at the start or a sentinel value at the end, there is no way to work out where the next transaction would start unless there is a one-to-one mapping between transaction and datagram.
As an aside, the other four TFTP transaction types have either a fixed length or end-of-string sentinel values but the data transaction is the decider here.