Our company developed stations that collect data in an agricultural field. These field could be in the middle of nowhere, so stations use GSM/GPRS with SIM card that automatically switches to strongest provider.
Every 5 minutes, an internet connection is setup to send data to a server. The data has a structure with packet length, command, sensor data and crc check. But these data structures are sent with a http post to an url.
For 480 bytes of data, about 2550 bytes of data traffic is used. There is a lot of overhead in the HTTP protocol. Because we only need to send 480 bytes of data, we have 80% overhead with the post over http. Now we have a few hundreds of stations and that number is growing. So costs for data traffic are increasing rapidly.
We want to do a redesign of the transmission of data. The data is send by microchip microprocessor in the stations.
Our goal is to decrease the overhead as much as possible, with guaranteed data delivery. So I looked into TCP and UDP.
TCP has failure detection and recovery, but has a higher overhead.
UDP has lower overhead, but there it is not guaranteed that data is delivered without failure.
My first idea is to build a server that listens to a TCP port. And stations sent the data over TCP. Mainly because of guarantied data delivery.
With UDP we have to develop check and resend of data ourself, but the data structure of our records is already prepared to do checks.
So I am really in doubt what to do. And I am trying to get an answer on these questions:
How many bytes overhead would it take by TCP and UDP to sent (and deliver) 480 bytes of data?
Are TCP and UDP the best ways to consider for sending 480 bytes of data, or is there a smarter solution with even lower overhead?
How many bytes overhead would it take by TCP and UDP to sent (and deliver) 480 bytes of data?
A (typical) TCP header is 20 bytes long although it can be (slightly) longer with options. If the entire 480 bytes are sent in a single TCP segment, you'd end up with 480 +20 + 20 (IP header) = 520 bytes before layer-2 overhead.
UDP has an 8 byte header, so for UDP you'll have 480 + 8 + 20 = 508 bytes.
However you should to consider that TCP is a stream protocol. Reading from a TCP socket is like reading from a binary file - you'd need to split that stream into individual messages yourself by using some sort of delimiter or prepending the length of the message to each message.
UDP on the other hand works on individual messages. Reading from a UDP socket would return messages one at a time.
Are TCP and UDP the best ways to consider for sending 480 bytes of data, or is there a smarter solution with even lower overhead?
UDP and TCP are the lowest level transport protocols on the internet. HTTP and other high-level protocols are built on top of them. If the size of data is critical, raw TCP and UDP are as low-overhead as you're going to get without using RAW sockets and embedding your data directly into IP packets.
Related
I know, I know. This question has been asked many times before. But I've spent an hour googling now without finding what I am looking for so I will ask it again and mention my context along with what makes the decision hard for me:
I am writing the server for a game where the response time is very important and a packet loss every now and then isn't a problem.
Judging by this and the fact that I as a server mostly have to send the same data to many different clients, the obvious answer would be UDP.
I had already started writing the code when I came across this:
In some applications TCP is faster (better throughput) than UDP.
This is the case when doing lots of small writes relative to the MTU size. For example, I read an experiment in which a stream of 300 byte packets was being sent over Ethernet (1500 byte MTU) and TCP was 50% faster than UDP.
In my case the information units I'm sending are <100 bytes, which means each one fits into a single UDP packet (which is quite pleasant for me because I don't have to deal with the fragmentation) and UDP seems much easier to implement for my purpose because I don't have to deal with a huge amount of single connections, but my top priority is to minimize the time between
client sends something to server
and
client receives response from server
So I am willing to pick TCP if that's the faster way.
Unfortunately I couldn't find more information about the above quoted case, which is why I am asking: Which protocol will be faster in my case?
UDP is still going to be better for your use case.
The main problem with TCP and games is what happens when a packet is dropped. In UDP, that's the end of the story; the packet is dropped and life continues exactly as before with the next packet. With TCP, data transfer across the TCP stream will stop until the dropped packet is successfully retransmitted, which means that not only will the receiver not receive the dropped packet on time, but subsequent packets will be delayed also -- most likely they will all be received in a burst immediately after the resend of the dropped packet is completed.
Another feature of TCP that might work against you is its automatic bandwidth control -- i.e. TCP will interpret dropped packets as an indication of network congestion, and will dial back its transmission rate in response; potentially to the point of dialing it down to near zero, in cases where lots of packets are being lost. That might be useful if the cause really was network congestion, but dropped packets can also happen due to transient network errors (e.g. user pulled out his Ethernet cable for a couple of seconds), and you might not want to handle those problems that way; but with TCP you have no choice.
One downside of UDP is that it often takes special handling to get incoming UDP packets through the user's firewall, as firewalls are often configured to block incoming UDP packets by default. For an action game it's probably worth dealing with that issue, though.
Note that it's not a strict either/or option; you can always write your game to work over both TCP and UDP, and either use them simultaneously, or let the program and/or the user decide which one to use. That way if one method isn't working well, you can simply use the other one, and it only takes twice as much effort to implement. :)
In some applications TCP is faster (better throughput) than UDP. This
is the case when doing lots of small writes relative to the MTU size.
For example, I read an experiment in which a stream of 300 byte
packets was being sent over Ethernet (1500 byte MTU) and TCP was 50%
faster than UDP.
If this turns out to be an issue for you, you can obtain the same efficiency gain in your UDP protocol by placing multiple messages together into a single larger UDP packet. i.e. instead of sending 3 100-byte packets, you'd place those 3 100-byte messages together in 1 300-byte packet. (You'd need to make sure the receiving program is able to correctly intepret this larger packet, of course). That's really all that the TCP layer is doing here, anyway; placing as much data into the outgoing packets as it has available and can fit, before sending them out.
I have a question about socket programming. When I use socket to send the data, we can use the API such as sendto() to send using TCP or UDP.
For sendto(), we give a array pointer and the byte number we want to send.
In this case, if I gave a large byte number (e.g.: 20000 bytes), based on my understanding, MTU of the network will not be that big so socket actually send mutiple packets instead of one big packet. Since these 20000 bytes are split into several UDP/TCP packets, but will these 20000 bytes be seen as one packet at beginning? Is this process UDP/TCP fragmentation ?
My another question is if I put the data size smaller than MTU into sendto(), then I can gurantee call sendto() once, socket only sends one TCP/UDP packet?
Thanks in advance.
will these 20000 bytes be seen as one packet at beginning? Is this process UDP/TCP fragmentation?
UDP will send it as one datagram if your socket send buffer is large enough to hold it. Otherwise you will get EMSGSIZE. It may subsequently get fragmented at the IP layer, and if a fragment gets lost so does the whole datagram, but if all the fragments arrive the entire datagram will be received intact.
TCP wil send it all, segmenting and fragmenting it however it sees fit. It will all arrive, intact and in order, unless there is a long enough network outage.
My another question is if I put the data size smaller than MTU into sendto(), then I can guarantee call sendto() once, socket only sends one TCP/UDP packet?
UDP: yes.
TCP: no.
I've created some type of client/server application that has its own data ACK system. It was originally written in TCP because of some limitations, but the base was written thinking about UDP.
The packets that I sent to the server had their own encapsulation (packet id and packet size headers. I know that UDP has also a checksum so I didn't add a header for that), but how TCP works, I know that the server may not receive the entire packet, so I gathered and buffered the data received until a full valid packet was received.
Now I have the chance to change my client/server program to UDP, and I know that one difference with TCP is that data is not received in the same order as sent (which is why I added a packet id header).
The thing that I want to know is: If I send multiple packets, will they be received with no guaranteed order but with guaranteed encapsulation? I mean, if I send a packet sized 1000 bytes of data and another packet sized 400 bytes of data later, will the server receive 2 packets, one of 1000 bytes and another one of 400 bytes, or is there a chance to receive 200 of that 1000 bytes, then 400 bytes of that 1000 bytes and later the rest of the bytes like TCP does?
UDP is a datagram service. Datagrams may be split for transport, but they will be reassembled before being passed up to the application layer.
With small packet sizes you should have no concern that packets will be broken into multiple packets. That generally is only an issue when the packet gets over an Ethernet network.
You ask" will the server receive 2 packets, one of 1000 bytes and another one of 400 bytes, or there's a chance to receive 200 of that 1000 bytes, then 400 bytes of that 1000 bytes and later the rest of the bytes like TCP can do?
With a packet size of under 1492 bytes there is not going to be any partial packets.
UPDATE:
Apparently I see a need to clarify why I say UDP packet lengths 1492 bytes or less will not affect transport robustness.
The maximum UDP length as implicitly specified in RFC 768 is 65535 including the 8 byte Header. Max Payload Frame Length is 65527 bytes.
While this should not be disputed number the UDP data length is often reported incorrectly. This is exemplified in a previous post:
What is the largest Safe UDP Packet Size on the Internet
A data packet is not constrained by the MTU of the underlying network ToS or communications protocol's Frame length (e.g. IP and Ethernet respectively). Discrepancies between MTU and Protocol Lengths are remedied by Fragmentation and Reassembly
At the Transport Layer each network Type of Service (ToS) has a specific Maximum Transmission Unit (MTU). UDP is encapsulated within IP Packets and IP Packets are encapsulated by the transporting Network's ToS. IP packets are often transmitted through networks of various ToS which include Ethernet, PPP, HDLC, and ADCCP.
When the MTU for a receiving Network ToS is less than the sending ToS then the receiving network must Fragment the received packet. When the Network sends a packet to a network with a higher MTU, the receiving Network must reassemble any Fragmented packets.
Ethernet is the defacto mainstream protocol with the lowest MTU. Non-Mainsteam Arcnet the MTU is 507 bytes. The practical lowest MTU is Ethernet's 1500 bytes, minus the overhead makes maximum payload length 1492 bytes.
If the UDP packet has more than 1492 bytes the data packet will likely be Fragmented and Reassembled. The Fragment and Reassembly adds complexity to the already complex process coupling UDP and IP, and therefore should be avoided.
Because UDP is a non-guaranteed datagram delivery protocol it boosts the transport performance. Robustness is left to the originating and terminating Application. RFC 1166 sets the standards for the communication protocol link layer, IP layer, and transport layer, the UDP Application is responsible for packetization, reassembly, and flow control.
The maximum UDP packet size can also be lowered by a Communication Host's Application Layer. The packet length is a balance between performance and robustness.
The Communications Host's Application Layer may set a maximum UDP packet size. The typical UDP max data length at the Application layer will use the maximum allowed by the IP protocol or the Host Data Link Layer, typically Ethernet.
It is the Application's programer that chooses to use the Host Application Layer or the Host Data Link layer. The Host Application Layer will detect UDP packet errors and discard the packet if necessary. When the application communicates directly with the Host Data Link, the application then has the responsibility of detecting packet errors.
Using maximum UDP data packet length of Ethernet's max payload length of 1492 bytes will eliminate the issues of Fragmentation and Delivery Order of multiple Frames.
That is why I said packet length is not a Fragmentation issue with packet lengths of 1000 and 400 bytes.
###
I do not know what you mean by "guaranteed encapsulation", it makes no sense to me.
With IP there is no guarantee of packet delivery of the order whether UDP or TCP.
As long as you control both sides of the conversation, you can work out your own protocol within the data packet to handle ordering and post packets. Reserve the first x bytes of the packet for a sequential order number and total number of packets. (e.g. 1 of 3, 2 of 2, 3 of 3). If the client side is missing a packet then the client must send a request for retransmission. You need to determine to what level you are going to go for data integrity. like maybe the re-transmission packet is lost.
That may be what you meant by "guaranteed encapsulation", Where there is other information within your datagram packet to ensure some integrity. You should add your own CRC for the total data being sent if broken into multiple datagrams. the checksum is not very robust and is only for the one packet.
UDP is much faster then TCP but TCP has flow control and guaranteed delivery.
UDP is good for streaming content like voice where a lost packet is not going to matter.
Network reliability has improved a lot since the days when these issues were a major concern.
Say Server S have a successful TCP connection with Client C.
C is keep sending 256-byte-long packets to S.
Is it possible that one of packets only receive part of it, but the connection does not break (Can continue receive new packets correctly)?
I thought TCP protocol itself will guarantee not lose any bytes while connecting. But seems not?
P.S. I'm using Python's socketserver library.
The TCP protocol does guarantee delivery. Thus (assuming there are no bugs in your code and in the TCP stack), the scenario you describe is impossible.
Do bear in mind that TCP is stream- rather than packet-oriented. This means that you may need to call recv() multiple times to read the entire 256-byte packet.
As #NPE said, TCP is a stream oriented protocol, that means that there is no guarantee on how much data bytes are sent in each TCP packet nor how many bytes are available for reading in the receiving socket. What TCP ensures is that the receiving socket will be provided with the data bytes in the same order that they were sent.
Consider a communication through a TCP connection socket between two hosts A and B.
When the application in A requests to send 256 bytes, for example, the A's TCP stack can send them in one, or several individual packets, or even wait before sending them. So, B may receive one or several packets with all or part of the bytes requested to be sent by A, and so, when the application in B is notified of the availability of received bytes, it's not sure that it could read at once the 256 bytes.
The only guaranteed thing is that the bytes B reads are in the same order that A sent them.
I am working on Video chat application using udp sockets,
iam able to capture raw audio data which is huge in size. as it is chat application I should be able to transfer this audio data contineously.
The problem is this audio data is huge so socket mtu is not allowing me to transfer this data.
I am finding out the way I can split up this data and send through sockets and capture them at other end and combined them to produce voice data.
Please guide me how using udp sockets
With UDP you have to take care by yourself of transmission order (UDP datagram number 1 could be received AFTER a UDP datagram number 2) and lost packets (UDP doesn't grant delivery of the datagram)
You should use TCP for big size transfers where the order of the packets matters.
About MTU, you don't have to care if it is smaller than the size of the data you're going to send. The OS will defragment it for you.
Just split up the data in 64k blocks (maximum size allowed for a single send() call) and loop until your data is totally transmitted.