I've created some type of client/server application that has its own data ACK system. It was originally written in TCP because of some limitations, but the base was written thinking about UDP.
The packets that I sent to the server had their own encapsulation (packet id and packet size headers. I know that UDP has also a checksum so I didn't add a header for that), but how TCP works, I know that the server may not receive the entire packet, so I gathered and buffered the data received until a full valid packet was received.
Now I have the chance to change my client/server program to UDP, and I know that one difference with TCP is that data is not received in the same order as sent (which is why I added a packet id header).
The thing that I want to know is: If I send multiple packets, will they be received with no guaranteed order but with guaranteed encapsulation? I mean, if I send a packet sized 1000 bytes of data and another packet sized 400 bytes of data later, will the server receive 2 packets, one of 1000 bytes and another one of 400 bytes, or is there a chance to receive 200 of that 1000 bytes, then 400 bytes of that 1000 bytes and later the rest of the bytes like TCP does?
UDP is a datagram service. Datagrams may be split for transport, but they will be reassembled before being passed up to the application layer.
With small packet sizes you should have no concern that packets will be broken into multiple packets. That generally is only an issue when the packet gets over an Ethernet network.
You ask" will the server receive 2 packets, one of 1000 bytes and another one of 400 bytes, or there's a chance to receive 200 of that 1000 bytes, then 400 bytes of that 1000 bytes and later the rest of the bytes like TCP can do?
With a packet size of under 1492 bytes there is not going to be any partial packets.
UPDATE:
Apparently I see a need to clarify why I say UDP packet lengths 1492 bytes or less will not affect transport robustness.
The maximum UDP length as implicitly specified in RFC 768 is 65535 including the 8 byte Header. Max Payload Frame Length is 65527 bytes.
While this should not be disputed number the UDP data length is often reported incorrectly. This is exemplified in a previous post:
What is the largest Safe UDP Packet Size on the Internet
A data packet is not constrained by the MTU of the underlying network ToS or communications protocol's Frame length (e.g. IP and Ethernet respectively). Discrepancies between MTU and Protocol Lengths are remedied by Fragmentation and Reassembly
At the Transport Layer each network Type of Service (ToS) has a specific Maximum Transmission Unit (MTU). UDP is encapsulated within IP Packets and IP Packets are encapsulated by the transporting Network's ToS. IP packets are often transmitted through networks of various ToS which include Ethernet, PPP, HDLC, and ADCCP.
When the MTU for a receiving Network ToS is less than the sending ToS then the receiving network must Fragment the received packet. When the Network sends a packet to a network with a higher MTU, the receiving Network must reassemble any Fragmented packets.
Ethernet is the defacto mainstream protocol with the lowest MTU. Non-Mainsteam Arcnet the MTU is 507 bytes. The practical lowest MTU is Ethernet's 1500 bytes, minus the overhead makes maximum payload length 1492 bytes.
If the UDP packet has more than 1492 bytes the data packet will likely be Fragmented and Reassembled. The Fragment and Reassembly adds complexity to the already complex process coupling UDP and IP, and therefore should be avoided.
Because UDP is a non-guaranteed datagram delivery protocol it boosts the transport performance. Robustness is left to the originating and terminating Application. RFC 1166 sets the standards for the communication protocol link layer, IP layer, and transport layer, the UDP Application is responsible for packetization, reassembly, and flow control.
The maximum UDP packet size can also be lowered by a Communication Host's Application Layer. The packet length is a balance between performance and robustness.
The Communications Host's Application Layer may set a maximum UDP packet size. The typical UDP max data length at the Application layer will use the maximum allowed by the IP protocol or the Host Data Link Layer, typically Ethernet.
It is the Application's programer that chooses to use the Host Application Layer or the Host Data Link layer. The Host Application Layer will detect UDP packet errors and discard the packet if necessary. When the application communicates directly with the Host Data Link, the application then has the responsibility of detecting packet errors.
Using maximum UDP data packet length of Ethernet's max payload length of 1492 bytes will eliminate the issues of Fragmentation and Delivery Order of multiple Frames.
That is why I said packet length is not a Fragmentation issue with packet lengths of 1000 and 400 bytes.
###
I do not know what you mean by "guaranteed encapsulation", it makes no sense to me.
With IP there is no guarantee of packet delivery of the order whether UDP or TCP.
As long as you control both sides of the conversation, you can work out your own protocol within the data packet to handle ordering and post packets. Reserve the first x bytes of the packet for a sequential order number and total number of packets. (e.g. 1 of 3, 2 of 2, 3 of 3). If the client side is missing a packet then the client must send a request for retransmission. You need to determine to what level you are going to go for data integrity. like maybe the re-transmission packet is lost.
That may be what you meant by "guaranteed encapsulation", Where there is other information within your datagram packet to ensure some integrity. You should add your own CRC for the total data being sent if broken into multiple datagrams. the checksum is not very robust and is only for the one packet.
UDP is much faster then TCP but TCP has flow control and guaranteed delivery.
UDP is good for streaming content like voice where a lost packet is not going to matter.
Network reliability has improved a lot since the days when these issues were a major concern.
Related
Our company developed stations that collect data in an agricultural field. These field could be in the middle of nowhere, so stations use GSM/GPRS with SIM card that automatically switches to strongest provider.
Every 5 minutes, an internet connection is setup to send data to a server. The data has a structure with packet length, command, sensor data and crc check. But these data structures are sent with a http post to an url.
For 480 bytes of data, about 2550 bytes of data traffic is used. There is a lot of overhead in the HTTP protocol. Because we only need to send 480 bytes of data, we have 80% overhead with the post over http. Now we have a few hundreds of stations and that number is growing. So costs for data traffic are increasing rapidly.
We want to do a redesign of the transmission of data. The data is send by microchip microprocessor in the stations.
Our goal is to decrease the overhead as much as possible, with guaranteed data delivery. So I looked into TCP and UDP.
TCP has failure detection and recovery, but has a higher overhead.
UDP has lower overhead, but there it is not guaranteed that data is delivered without failure.
My first idea is to build a server that listens to a TCP port. And stations sent the data over TCP. Mainly because of guarantied data delivery.
With UDP we have to develop check and resend of data ourself, but the data structure of our records is already prepared to do checks.
So I am really in doubt what to do. And I am trying to get an answer on these questions:
How many bytes overhead would it take by TCP and UDP to sent (and deliver) 480 bytes of data?
Are TCP and UDP the best ways to consider for sending 480 bytes of data, or is there a smarter solution with even lower overhead?
How many bytes overhead would it take by TCP and UDP to sent (and deliver) 480 bytes of data?
A (typical) TCP header is 20 bytes long although it can be (slightly) longer with options. If the entire 480 bytes are sent in a single TCP segment, you'd end up with 480 +20 + 20 (IP header) = 520 bytes before layer-2 overhead.
UDP has an 8 byte header, so for UDP you'll have 480 + 8 + 20 = 508 bytes.
However you should to consider that TCP is a stream protocol. Reading from a TCP socket is like reading from a binary file - you'd need to split that stream into individual messages yourself by using some sort of delimiter or prepending the length of the message to each message.
UDP on the other hand works on individual messages. Reading from a UDP socket would return messages one at a time.
Are TCP and UDP the best ways to consider for sending 480 bytes of data, or is there a smarter solution with even lower overhead?
UDP and TCP are the lowest level transport protocols on the internet. HTTP and other high-level protocols are built on top of them. If the size of data is critical, raw TCP and UDP are as low-overhead as you're going to get without using RAW sockets and embedding your data directly into IP packets.
I have a question about socket programming. When I use socket to send the data, we can use the API such as sendto() to send using TCP or UDP.
For sendto(), we give a array pointer and the byte number we want to send.
In this case, if I gave a large byte number (e.g.: 20000 bytes), based on my understanding, MTU of the network will not be that big so socket actually send mutiple packets instead of one big packet. Since these 20000 bytes are split into several UDP/TCP packets, but will these 20000 bytes be seen as one packet at beginning? Is this process UDP/TCP fragmentation ?
My another question is if I put the data size smaller than MTU into sendto(), then I can gurantee call sendto() once, socket only sends one TCP/UDP packet?
Thanks in advance.
will these 20000 bytes be seen as one packet at beginning? Is this process UDP/TCP fragmentation?
UDP will send it as one datagram if your socket send buffer is large enough to hold it. Otherwise you will get EMSGSIZE. It may subsequently get fragmented at the IP layer, and if a fragment gets lost so does the whole datagram, but if all the fragments arrive the entire datagram will be received intact.
TCP wil send it all, segmenting and fragmenting it however it sees fit. It will all arrive, intact and in order, unless there is a long enough network outage.
My another question is if I put the data size smaller than MTU into sendto(), then I can guarantee call sendto() once, socket only sends one TCP/UDP packet?
UDP: yes.
TCP: no.
How do I find out the current UDP packet payload size for a udp socket? Or is there any way to set it manually? I could not find any socket options related to that.
How do I find out the current UDP packet payload size for a udp socket?
There isn't such a thing.
Each datagram can have its own payload size, up to the maximum as determined by the outgoing socket's socket send buffer size, up to the absolute maximum dictated by the protocol, which is 65507 for IPv4. In practice the maximum recommended size through routers is 534 bytes.
The size of a received datagram is returned by recvfrom().
Say Server S have a successful TCP connection with Client C.
C is keep sending 256-byte-long packets to S.
Is it possible that one of packets only receive part of it, but the connection does not break (Can continue receive new packets correctly)?
I thought TCP protocol itself will guarantee not lose any bytes while connecting. But seems not?
P.S. I'm using Python's socketserver library.
The TCP protocol does guarantee delivery. Thus (assuming there are no bugs in your code and in the TCP stack), the scenario you describe is impossible.
Do bear in mind that TCP is stream- rather than packet-oriented. This means that you may need to call recv() multiple times to read the entire 256-byte packet.
As #NPE said, TCP is a stream oriented protocol, that means that there is no guarantee on how much data bytes are sent in each TCP packet nor how many bytes are available for reading in the receiving socket. What TCP ensures is that the receiving socket will be provided with the data bytes in the same order that they were sent.
Consider a communication through a TCP connection socket between two hosts A and B.
When the application in A requests to send 256 bytes, for example, the A's TCP stack can send them in one, or several individual packets, or even wait before sending them. So, B may receive one or several packets with all or part of the bytes requested to be sent by A, and so, when the application in B is notified of the availability of received bytes, it's not sure that it could read at once the 256 bytes.
The only guaranteed thing is that the bytes B reads are in the same order that A sent them.
One of our customers is having trouble submitting data from our application (on their PC) to a server (different geographical location). When sending packets under 1100 bytes everything works fine, but above this we see TCP retransmitting the packet every few seconds and getting no response. The packets we are using for testing are about 1400 bytes (but less than 1472). I can send an ICMP ping to www.google.com that is 1472 bytes and get a response (so it's not their router/first few hops).
I found that our application sets the DF flag for these packets, and I believe a router along the way to the server has an MTU less than/equal to 1100 and dropping the packet.
This affects 1 client in 5000, but since everybody's routes will be different this is expected.
The data is a SOAP envelope and we expect a SOAP response back. I can't justify WHY we do it, the code to do this was written by a previous developer.
So... Are there any benefits OR justification to setting the DF flag on TCP packets for application data?
I can think of reasons it is needed for network diagnostics applications but not in our situation (we want the data to get to the endpoint, fragmented or not). One of our sysadmins said that it might have something to do with us using SSL, but as far as I know SSL is like a stream and regardless of fragmentation, as long as the stream is rebuilt at the end, there's no problem.
If there's no good justification I will be changing the behaviour of our application.
Thanks in advance.
The DF flag is typically set on IP packets carrying TCP segments.
This is because a TCP connection can dynamically change its segment size to match the path MTU, and better overall performance is achieved when the TCP segments are each carried in one IP packet.
So TCP packets have the DF flag set, which should cause an ICMP Fragmentation Needed packet to be returned if an intermediate router has to discard a packet because it's too large. The sending TCP will then reduce its estimate of the connection's Path MTU (Maximum Transmission Unit) and re-send in smaller segments. If DF wasn't set, the sending TCP would never know that it was sending segments that are too large. This process is called PMTU-D ("Path MTU Discovery").
If the ICMP Fragmentation Needed packets aren't getting through, then you're dealing with a broken network. Ideally the first step would be to identify the misconfigured device and have it corrected; however, if that doesn't work out then you add a configuration knob to your application that tells it to set the TCP_MAXSEG socket option with setsockopt(). (A typical example of a misconfigured device is a router or firewall that's been configured by an inexperienced network administrator to drop all ICMP, not realising that Fragmentation Needed packets are required by TCP PMTU-D).
The operation of Path-MTU discovery is described in RFC 1191, https://www.rfc-editor.org/rfc/rfc1191.
It is better for TCP to discover the Path-MTU than to have every packet over a certain size fragmented into two pieces (typically one large and one small).
Apparently, some protocols like NFS benefit from avoiding fragmentation (link text). However, you're right in that you typically shouldn't be requesting DF unless you really require it.