Examine data at in callout driver for FWPM_LAYER_EGRESS_VSWITCH_TRANSPORT_V4 layer in WFP - virtualization

I am writing the callout driver for Hyper-V 2012 where I need to filter the packets sent from virtual machines.
I added filter at FWPM_LAYER_EGRESS_VSWITCH_TRANSPORT_V4 layer in WFP. Callout function receive packet buffer which I am typecasting it to NET_BUFFER_LIST. I am doing following to get the data pointer
pNetBuffer = NET_BUFFER_LIST_FIRST_NB((NET_BUFFER_LIST*)pClassifyData->pPacket);
pContiguousData = NdisGetDataBuffer(pNetBuffer, NET_BUFFER_DATA_LENGTH(pNetBuffer), 0, 1, 0);
I have simple client-server application to test the packet data. Client is on VM and server is another machine. As I observed, data sent from client to server is truncated and some garbage value is added at the end. There is no issue for sending message from server to client. If I dont add this layer filter client-server works without any issue.
Callback function receives the metadata which incldues ipHeaderSize and transportHeaderSize. Both these values are zero. Are these correct values or should those be non-zero??
Can somebody help me to extract the data from packet in callout function and forward it safely to further layers?
Thank You.

These are the TCP packets. I looked into size and offset information. It seems the problem is consistent across packets.
I checked below values in (NET_BUFFER_LIST*)pClassifyData->pPacket.
NET_BUFFER_LIST->NetBUfferListHeader->NetBUfferListData->FirstNetBuffer->NetBuffe rHeader->NetBufferData->CurrentMdl->MappedSystemVa
First 24 bytes are only sent correctly and remaining are garbage.
For example total size of the packet is 0x36 + 0x18 = 0x4E I don't know what is there in first 0x36 bytes which is constant for all the packets. Is it a TCP/IP header? Second part 0x18 is the actual data which i sent.
I even tried with API NdisQueryMdl() to retrieve from MDL list.
So on the receiver side I get only 24 bytes correct and remaining is the garbage. How to read the full buffer from NET_BUFFER_LIST?

Related

I can not sent short messages by TCP protocol

I have a trouble to tune TCP client-server communication.
My current project has a client, running on PC (C#) and a server,
running on embedded Linux 4.1.22-ltsi.
Them use UDP communication to exchanging data.
The client and server work in blocking mode and
send short messages one to 2nd
(16, 60, 200 bytes etc.) that include either command or set of parameters.
The messages do note include any header with message length because
UDP is message oriented protocol. Its recvfrom() API returns number of received bytes.
For my server's program structure is important to get and process entire alone message.
The problem is raised when I try to implement TCP communication type instead of UDP.
The server's receive buffer (recv() TCP API) is 2048 bytes:
#define UDP_RX_BUF_SIZE 2048
numbytes = recv(fd_connect, rx_buffer, UDP_RX_BUF_SIZE, MSG_WAITALL/*BLOCKING_MODE*/);
So, the recv() API returns from waiting when rx_buffer is full, i.e after it receives
2048 bytes. It breaks all program approach. In other words, when client send 16 bytes command
to server and waits an answer from it, server's recv() keeps the message
"in stomach", until it will receive 2048 bytes.
I tried to fix it as below, without success:
On client side (C#) I set the socket parameter theSocket.NoDelay.
When I checked this on the sniffer I saw that client sends messages "as I want",
with requested length.
On server side I set TCP_NODELAY socket option to 1
int optval= 1;
setsockopt(fd,IPPROTO_TCP, TCP_NODELAY, &optval, sizeof(optval);
On server side (Linux) I checked socket options SO_SNDLOWAT/SO_RCVLOWAT and they are 1 byte each one.
Please see the attached sniffer's log picture. 10.0.0.10 is a client. 10.0.0.106 is a server. It is seen, that client activates PSH flag (push), informing the server side to move the incoming data to application immediately and do not fill a buffer.
Additional question: what is SSH encrypted packets that runs between the sides. I suppose that it is my Eclipse debugger on PC (running server application through the same Ethernet connection) sends them. Am I right?
So, my problem is how to cause `recv() API to return each short message (16, 60, 200 bytes etc.) instead of accumulating them until receiving buffer fills.
TCP is connection oriented and it also maintains the order in which packets are sent and received.
Having said that, in TCP client, you will receive the stream of bytes and not the individual udp message as in UDP. So you will need to send the packet length and marker as the initial bytes.
So client can first find the packet length and then read data till packet length is reached and then expect new packet length.
You can also check for library like netty, zmq to do this extra work

How socketcan get send failure status?

As we all know, in the CAN bus communication protocol, sender know whether the data was successfully sent. I send socketcan data as follows.
ret = write (socket, frame, sizeof (struct can_frame));
However, even if the CAN communication cable is disconnected, the return value of ret is still 16(=sizeof (struct can_frame)).I queried the information and found that the problem was due to the tx_queue of the network stack used by socketcan. When write is called multiple times, the buffer is full and the return value of ret is -1.
But this is not the behavior I expect, I hope that every frame of data sent will immediately get the status of success or failure.
By
echo 0> / sys / class / net / can0 / tx_queue_len
I want to cancel the tx_queue, but it does not work.
What I want to ask is, is there a way to cancel the tx_queue of socketcan, or to get the status of the each sending frame about controller through the API (such as libsocketcan).
Thanks.
You cannot use write() itself to discover whether a CAN frame was successfully put on the bus, because all it does is write the frame to the in-kernel socket buffer. The kernel then moves the frame to the transmit queue of the SocketCAN network interface, followed by the driver moving it to the transmit buffer of the CAN controller, which finally puts the frame on the bus. What you want is a direct write which bypasses all those buffers, but that's not possible with SocketCAN, even if you set the transmit queue length to 0.
However, there is another way to get confirmation. If you enable the CAN_RAW_RECV_OWN_MSGS socket option (see section 4.1.4 and 4.1.7 in the SocketCAN documentation), you will receive frames that were successfully sent. You'll need to use recvmsg() so you get the message flags. msg_flags will have the MSG_CONFIRM bit set for a frames that was successfully sent by the same socket on which it is received. You won't be informed of failures, but you can detect them by using a timeout for the confirmation.
It's not an ideal solution because it mixes the read and write logic in your application. One way to avoid this would be to use two sockets. One for writing and reading MSG_CONFIRM frames, the other for reading all other frames. You could then create a (blocking) write function that does a write() followed by multiple calls to recvmsg() with an appropriate timeout.
Finally, it is useful to enable error frames (through the CAN_RAW_ERR_FILTER socket option). If you send a frame on a socket with a disconnected cable, this will typically result in a bus off state, which will be reported in an error frame.

Confusion with AF_INET, with SOCK_RAW as the socket type, V/S AF_PACKET, with SOCK_DGRAM and SOCK_RAW as the socket type

I am quite new to network programming and have been trying to wrap my head around this for quite sometime now. After going through numerous resources over the internet, I have the below conclusion and following it the confusion.
Conclusion 1:
When we are talking about creating a socket as :
s = socket(AF_INET, SOCK_RAW, 0);
we are basically trying to create a raw socket. With a raw socket created this way one would be able to bypass TCP/UDP layer in the OSI stack. Meaning, when the packet is received by the application over this socket, the application would have the packet containing the network layer (layer 3) headers wrapping the layer 2 headers wrapping the actual data. So the application is free to process this packet, beyond layer 3, in anyway it wants to.
Similarly, when sending a packet through this socket also, the application is free to handle the packet creation till layer 4 and then pass it down to layer 3, from which point on the kernel would handle things.
Conclusion 2: When we are talking about creating a socket as :
s = socket(AF_PACKET, SOCK_RAW, 0);
we are again trying to create a raw socket. With a raw socket created this way one would be able to bypass all the layers of the OSI altogether.
A pure raw packet would be available to the user land application and it is free to do whatever it wants with that packet. A packets received over such a socket would have all the headers intact and the application would also have access to all of those headers.
Similarly, when sending data over such a socket as well, the user application is the one that would have to handle everything with regards to the creation of the packet and the wrapping of the actual data with the headers of each layer before it is actually placed on the physical medium to be transmitted across.
Conclusion 3: When we are talking about creating a socket as :
s = socket(AF_PACKET, SOCK_DGRAM, 0);
we are again trying to create a raw socket. With a raw socket created this way one would be able to bypass data link layer (layer 2) in the OSI stack. Meaning, when a packet over such a socket is received by the user land application, data link layer header is removed from the packet.
Similarly, while sending a packet through this socket, a suitable data link layer header is added to the packet, based on the information in the sockaddr_ll destination address.
Now below are my queries/points of confusion:
Are the conclusions that I have drawn above about raw sockets correct ?
I did not quite clearly understand the conclusion 3 above. Can someone please explain ? Like, does it mean that when the user land application receives a packet through this socket, it is only the data link layer headers that would have been handled by the kernel? And so the packet would be like the message wrapped with directly the layer 3 headers and wrapped subsequently by the layers above it?
If the conclusions drawn above are correct, conclusion 1 and conclusion 2 still make sense. But if conclusion 3 above (and the speculations around it in 2 above) are correct, when exactly would any application ever need to do that ?
Some resources that I have referred to trying to understand the above:
https://docs.freebsd.org/44doc/psd/21.ipc/paper.pdf
https://sock-raw.org/papers/sock_raw
https://www.quora.com/in/Whats-the-difference-between-the-AF_PACKET-and-AF_INET-in-python-socket
http://www.linuxcertif.com/man/7/PF_PACKET/
http://opensourceforu.com/2015/03/a-guide-to-using-raw-sockets/
'SOCK_RAW' option in 'socket' system call
http://stevendanna.github.io/blog/2013/06/23/a-short-sock-raw-adventure/
https://www.intervalzero.com/library/RTX/WebHelp/Content/PROJECTS/Application%20Development/Understanding_Network/Using_RAW_Sockets.htm
You got quite closer to their real explanation. Here I've got something to tell you what I think you're missing or wrong about.
First, for s = socket(AF_INET, SOCK_RAW, 0);, when packet is received over such socket, it will contain an IP header always. If IP_HDRINCL is not enabled, for sending, the packet must contain the IP header, the TCP/IP stack will not generate this for you. All other upper layers can be received by this socket.
Secondly, s = socket(AF_PACKET, SOCK_RAW, 0);:
This is a special type of Raw Socket and called Packet-socket in Linux system. This type of socket allows to send and receive packet at OSI layer 2 that's why APIs used for such socket are referred to as Link Layer API. Any protocol can be implemented on the top of physical layer by using this socket. Interestingly, we can also interact with the packet's trailer with this socket what though we don't frequently need.
Thirdly, In case of s = socket(AF_PACKET, SOCK_DGRAM, 0);, your conclusion is right. In this type of Packet Socket, you don't need to think about Ethernet header. It's a bit upper layer than previous type.
So, we can say that the main distinction amongst these types of sockets is their possibility of access. To summarize:
Raw-Socket access:
| Layer 3 header | Layer 4 header | Payload |
Packet-Socket access:
| Layer 2 header | Layer 3 header | Layer 4 header | Payload | Layer 2 trailer |

Packet Size ,Window Size and Socket Buffer In TCP

After studying the "window size" concept, what I understood is that it keeps packet before sending over wire and till acknowledgement come for earliest packet . Once this gets filled up, subsequent packet will be dropped. Somewhere I also have read that TCP is a streaming protocol, and packet is what related to IP protocol at Network layer .
What I assumed till was that I have declared a Buffer (inside code) which I fill with some data and send this Buffer using socket. I declared a buffer of 10000 bytes and send it repeatedly using socket over 10 Gbps link .
I have following assumptions and questions. Please verify and help
If I want to send a packet of 64,256,512 etc. bytes, declared buffer inside code of that much space and send over socket. Each execution of send() command will send one packet of that much size .
So if I want to study the packet size variation effect on throughput, what do I have to do? Do I need to vary buffer size in code?
What are the socket buffer which we set using SO_SNDBUF and SO_RECVBUF? Google says it's buffer space for socket. Is it same as TCP window size or something different? Which parameter is more suitable to vary or to increase throughput?
Also there are three parameter in socket buffer: Min, Default and Max. Which one should I vary to my experiment and to get more relevance?
If I want to send a packet of 64,256,512 etc. bytes , Declared buffer inside code of that much space and send over socket .Each execution of send() command will send one packet of that much size.
Only if you disable the Nagle algorithm and the size is less than the path MTU. You mustn't rely on this.
So if I want to Study the Packet size variation effect on throughput, What I have to do , vary buffer space in Code?
No. Vary SO_RCVBUF at the receiver. This is the single biggest determinant of throughput, as it determines the maximum receive window.
what are the socket buffer which we set using SO_SNDBUF and SO_RCVBUF
Send buffer size at the sender, and receive buffer size at the receiver. In the kernel.
It's Same as TCP Window size
See above.
or else different ? Which parameter is more suitable to vary to increase throughput ?
See above.
Also there are three parameter in Socket Buffer min Default and Max . Which one should I vary for My experiment to get more relevance
None of them. These are the system-wide parameters. Just play with SO_SNDBUF and SO_RCVBUF for the specific sockets in your application.
TCP does not directly expose a way to control the way packets are sent since it is a stream protocol. But you can make the TCP stack send packets by disabling the Nagle algorithm. That way all data that you send will be sent out immediately instead of being buffered. Data will be split into packets of MTU size which is like ~1400 bytes. Depends on the link.
To answer (2): Disable nagling and invoke send with buffers of < 1400 bytes. Use Wireshark to make sure you got what you wanted.
The buffer settings have nothing to do with any of this. I know of no valid reason to touch them.
In general this question is probably moot since you seem to want to send a lot of data. Just leave Nagling enabled and send big buffers (such as 64KB).
I do some experience on Windows 10:
code from https://docs.python.org/3/library/socketserver.html#asynchronous-mixins,
RawCap for loopback capture,
WireShark for watching result.
The primary client code is:
def client(ip, port, message):
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sock.setsockopt(socket.SOL_SOCKET,socket.SO_RCVBUF, 100000)
sock.connect((ip, port))
sock.sendall(bytes(message, 'ascii'))
response = str(sock.recv(1024), 'ascii')
print("Received: {}".format(response))
Here is the result(the server port is 11111):
you can see, the tcp recive window size is the same as SO_RCVBUF, may it is platform indepent, you can verify it on other platform.
on https://msdn.microsoft.com/en-us/library/windows/hardware/ff570832(v=vs.85).aspx
The SO_RCVBUF socket option determines the size of a socket's receive buffer that is used by the underlying transport.
verified this.
Also, when I set SO_SNDBUF = 100000, it have no affects on the tcp transmission between client and server, as server just can discard data if client send much data one time.
So, if you want to change SO_RCVBUF to max Throughput, you can refer http://packetbomb.com/understanding-throughput-and-tcp-windows/, the os may offer func to detect ideal send backlog (ISB).

Why skb_buffer needs to be skipped by 20 bytes to read the transport buffer while the packet is input?

I am writing a network module in Linux, and I see that the tcp header can be extracted only after skipping 20 bytes from the skb buffer, even though the API is 'skb_transport_header'.
What is the reason behind it? Can some body please explain in detail? The same is not required for outgoing packets. I understand that while receiving the packets, the headers are removed as the packet flows from L1 to L5. However, when the packet is outgoing, the headers are added. How this makes a difference here?
/**For Input packet **/
struct tcphdr *tcp;
tcp = (struct tcphdr *)(skb_transport_header(skb)+20);
/** For Outgoing packet **/
struct tcphdr *tcp;
tcp = (struct tcphdr *)(skb_transport_header(skb));
It depends on where in the stack you process the packet. Just after receipt of the packet, the transport header offset won't yet have been set. Once you've gotten to the point where it's been determined that this packet is in fact destined to the local box, that should no longer be necessary. This happens for IPv4 in ip_local_deliver_finish(). (Note that tcp_hdr(), for example, assumes that the transport_header location is already set.)
This makes total sense (even though it can be hard to determine where things like this happen in the normal flow): As each layer is recognized and processed, the starting offset of the next layer is recorded in the sk_buff. The headers aren't actually removed, the skb "data" location is just adjusted to point beyond them. And the layer-specific location is similarly adjusted.
On output, it's a little more straightforward and is done in the opposite order: transport header will be created first. Then, the network header is prepended to that, etc.