Does raw sockets reassemble the packet? - sockets

I implement a simple tunneling and encryption of outgoing IP packets, i.e. each packet+IP header is encrypted and added with a new IP header.
For this purpose I use raw sockets in the sender and the receiver.
I just try to figure out if fragmentation of the outgoing packets can result in breaking the capability to decrypt them again.
Do raw sockets provide the assembled packet or do I need to implement de-fragmentation by myself ?

Assuming that you are referring to RAW sockets of the Berkeley Sockets API (aka BSD Sockets),
the answer is:
No, they do not combine fragments of fragmented IP packets. You will receive the IP packets, including IP header, just as they did arrive at your network interface.
Please note that there exist various implementations of BSD sockets in different operation systems. You didn't say for which system(s) you are developing that code. And despite the fact that the POSIX standard based its network API on BSD sockets, POSIX doesn't specify RAW sockets at all, so a POSIX conforming operation system doesn't even have to support RAW sockets.
And despite the fact many systems have adopted the BSD API, among them Linux/Android, FreeBSD, macOS/iOS, and even Windows, there are some important differences in their implementations. E.g. they support different socket options, their socket options behave in different way, or they support different extensions. As an example for differences in socket options, see here. So your system may theoretically have an option you can set to get reassembled packets. This would not be portable but RAW sockets themselves are only limited portable to begin with.

This is OS specific, but generally it depends on how you read them. Take a look at a couple of linux docs on POSIX sockets:
packet
socket
recvfrom
In particular you use a SOCK_RAW then recvfrom will not always return full packets. See the following quotes:
If a message is too long to fit in the supplied buffer,
excess bytes may be discarded depending on the type of socket the
message is received from.
If len is too small to fit an entire packet, the excess bytes will be returned from the next read.
The receive calls
normally return any data available, up to the requested amount,
rather than waiting for receipt of the full amount requested.
To your question:
Do raw sockets provide the assembled packet or do I need to implement de-fragmentation by myself ?
They don't, you need to de-fragment yourself. If the socket isn't flushed, or fragmentation occurs the call will return any data available, possibly only partial packets the expectation is that you restructure them.

Related

C# BeginSend/BeginReceive sometimes send or receive data attatched [duplicate]

I have two apps sending tcp packages, both written in python 2. When client sends tcp packets to server too fast, the packets get concatenated. Is there a way to make python recover only last sent package from socket? I will be sending files with it, so I cannot just use some character as packet terminator, because I don't know the content of the file.
TCP uses packets for transmission, but it is not exposed to the application. Instead, the TCP layer may decide how to break the data into packets, even fragments, and how to deliver them. Often, this happens because of the unterlying network topology.
From an application point of view, you should consider a TCP connection as a stream of octets, i.e. your data unit is the byte, not a packet.
If you want to transmit "packets", use a datagram-oriented protocol such as UDP (but beware, there are size limits for such packets, and with UDP you need to take care of retransmissions yourself), or wrap them manually. For example, you could always send the packet length first, then the payload, over TCP. On the other side, read the size first, then you know how many bytes need to follow (beware, you may need to read more than once to get everything, because of fragmentation). Here, TCP will take care of in-order delivery and retransmission, so this is easier.
TCP is a streaming protocol, which doesn't expose individual packets. While reading from stream and getting packets might work in some configurations, it will break with even minor changes to operating system or networking hardware involved.
To resolve the issue, use a higher-level protocol to mark file boundaries. For example, you can prefix the file with its length in octets (bytes). Or, you can switch to a protocol that already handles this kind of stuff, like http.
First you need to know if the packet is combined before it is sent or after. Use wireshark to check it the sender is sending one packet or two. If it is sending one, then your fix is to call flush() after each write. I do not know the answer if the receiver is combining packets after receiving them.
You could change what you are sending. You could send bytes sent, followed by the bytes. Then the other side would know how many bytes to read.
Normally, TCP_NODELAY prevents that. But there are very few situations where you need to switch that on. One of the few valid ones are telnet style applications.
What you need is a protocol on top of the tcp connection. Think of the TCP connection as a pipe. You put things in one end of the pipe and get them out of the other. You cannot just send a file through this without both ends being coordinated. You have recognised you don't know how big it is and where it ends. This is your problem. Protocols take care of this. You don't have a protocol and so what you're writing is never going to be robust.
You say you don't know the length. Get the length of the file and transmit that in a header, followed by the number of bytes.
For example, if the header is a 64bits which is the length, then when you receive your header at the server end, you read the 64bit number as the length and then keep reading until the end of the file which should be the length.
Of course, this is extremely simplistic but that's the basics of it.
In fact, you don't have to design your own protocol. You could go to the internet and use an existing protocol. Such as HTTP.

Linux: checking of incoming UDP datagrams

I'm working with special-purpose hardware that is connected on a 10G Ethernet link. I've got some questions on the handling of incoming datagrams, as follows:
What happens if the NIC discovers an incorrect link-level Ethernet CRC? Some searching shows that errors may not be reliably reported (here, for instance). Can I expect to get better stats from more recent kernels (2.6 - 3.10?)
What does the kernel actually check before deciding whether to return a packet to a recv? I'm guessing that for IPv4, the IPv4 header checksum must be correct, but what about the optional UDP header checksum?
Can recv ever return 0 for a UDP/SOCK_DGRAM?
For a non-blocking SOCK_DGRAM socket, does recv always return the entire packet when data is available? I guess it has to, but it's not obvious from the docs.
Thanks.
My knowledge may be out of date here, but historically, packets with FCS errors were not delivered at all and were not counted toward the interface statistics. The Ethernet layer error counts are usually reported by ethtool -S <interface>. The problem has always been that the interface statistics were maintained above the driver level and there was no standard API internally for network drivers to report those statistics. (Also, of course, in the very old days of 10Mb half duplex, collisions happened pretty frequently and Ethernet layer statistics weren't terribly informative as to your own adapter's behavior.)
You should not receive a packet if its IP header checksum is wrong, or if the UDP checksum is wrong when a checksum is provided (i.e. non-zero).
Yes. If you provide a zero length buffer, you will receive the next incoming datagram but then the entire content will be truncated, resulting in a return value of zero. Also, UDP permits zero-length datagrams: so if you receive a datagram with no content, the return value would also be zero. Aside from those two cases, I don't believe you'll get a return value of zero.
Yes, you should get the entire datagram provided there is space in your buffer. Otherwise, no. If you don't provide enough space to hold the entire datagram, the part that doesn't fit is discarded (i.e. your next recv will get a subsequent packet, not the end of the truncated one).

How to split received with boost asio udp sockets united datagrams

I've made my UDP server and client with boost::asio udp sockets. Everything looked good before I started sending more datagrams. They come correctly from client to server. But, they are united in my buffer into one message.
I use
udp::socket::async_receive with std::array<char, 1 << 18 > buffer
for making async request. And receive data through callback
void on_receive(const error_code& code, size_t bytes_transferred)
If I send data too often (every 10 milliseconds) I receive several datagrams simultaneously into my buffer with callback above. The question is - how to separate them? Note: my UDP datagrams have variable length. I don't want to use addition header with size, cause it'll make my code useless for third-party datagrams.
I believe this is a limitation in the way boost::asio handles stateless data streams. I noticed exactly the same behavior when using boost::asio for a serial interface. When I was sending packets with relatively large gaps between them I was receiving each one in a separate callback. As the packet size grew and the gap between the packets therefore decreased, it reached a stage when it would execute the callback only when the buffer was full, not after receipt of a single packet.
If you know exactly the size of the expected datagrams, then your solution of limiting the input buffer size is a perfectly sensible one, as you know a-priori exactly how large the buffer needs to be.
If your congestion is coming from having multiple different packet types being transmitted, so you can't pre-allocate the correct size buffer, then you could potentially create different sockets on different ports for each type of transaction. It's a little more "hacky" but given the virtually unlimited nature of ephemeral port availability, as long as you're not using 20,000 different packet types that would probably help you out as-well.

Determining Packets Received with Winsock2

Is there a way to determine how many packets where received while using recv() with Winsock? I am looking for a solution to implement at the client, without special requirements on the server side (which I have no control of)
You'd need to packet-sniff using something like the WinPCap. Then you could correlate the packets captured with the socket used.

What's the difference between streams and datagrams in network programming?

What's the difference between sockets (stream) vs sockets (datagrams)? Why use one over the other?
A long time ago I read a great analogy for explaining the difference between the two. I don't remember where I read it so unfortunately I can't credit the author for the idea, but I've also added a lot of my own knowledge to the core analogy anyway. So here goes:
A stream socket is like a phone call -- one side places the call, the other answers, you say hello to each other (SYN/ACK in TCP), and then you exchange information. Once you are done, you say goodbye (FIN/ACK in TCP). If one side doesn't hear a goodbye, they will usually call the other back since this is an unexpected event; usually the client will reconnect to the server. There is a guarantee that data will not arrive in a different order than you sent it, and there is a reasonable guarantee that data will not be damaged.
A datagram socket is like passing a note in class. Consider the case where you are not directly next to the person you are passing the note to; the note will travel from person to person. It may not reach its destination, and it may be modified by the time it gets there. If you pass two notes to the same person, they may arrive in an order you didn't intend, since the route the notes take through the classroom may not be the same, one person might not pass a note as fast as another, etc.
So you use a stream socket when having information in order and intact is important. File transfer protocols are a good example here. You don't want to download some file with its contents randomly shuffled around and damaged!
You'd use a datagram socket when order is less important than timely delivery (think VoIP or game protocols), when you don't want the higher overhead of a stream (this is why DNS is primarily a datagram protocol, so that servers can respond to many, many requests at once very quickly), or when you don't care too much if the data ever reaches its destination.
To expand on the VoIP/game case, such protocols include their own data-ordering mechanism. But if one packet is damaged or lost, you don't want to wait on the stream protocol (usually TCP) to issue a re-send request -- you need to recover quickly. TCP can take up to some number of minutes to recover, and for realtime protocols like gaming or VoIP even three seconds may be unacceptable! Using a datagram protocol like UDP allows the software to recover from such an event extremely quickly, by simply ignoring the lost data or re-requesting it sooner than TCP would.
VoIP is a good candidate for simply ignoring the lost data -- one party would just hear a short gap, similar to what happens when talking to someone on a cell phone when they have poor reception. Gaming protocols are often a little more complex, but the actions taken will usually be to either ignore the missing data (if subsequently-received data supercedes the data that was lost), re-request the missing data, or request a complete state update to ensure that the client's state is in sync with the server's.
Stream Socket:
Dedicated & end-to-end channel between server and client.
Use TCP protocol for data transmission.
Reliable and Lossless.
Data sent/received in the similar order.
Long time for recovering lost/mistaken data
Datagram Socket:
Not dedicated & end-to-end channel between server and client.
Use UDP for data transmission.
Not 100% reliable and may lose data.
Data sent/received order might not be the same.
Don't care or rapid recovering lost/mistaken data.
If it is the network programming I think starting from sockets would be a good start.
socket = ip + port
there are three types of sockets
stream (TCP, order and delivery guaranteed,no duplication,no length or char boundaries for data,connection-oriented,reliable, concurrency)
datagram(UDP,packet-based, connectionless, datagram size limit, data can be lost or duplicated, order not guaranteed,not reliable)
raw (direct access to lower layer protocols IP,ICMP)
I do not see any strict rule for transport protocol type as to what socket has to use what transport protocol and reliability should not be mistaken because UDP is realiable in case both ends are active.
Reliability refers to more like reliability of delivery since there are sequence number checks by using TCP as transport protocol which do not exist in UDP.It is better using network protocol analyzer like wireshark tcpdump etc to see what your software is exactly doing; kind of verification or merging theory on the paper with your work in action.