XDP and sk_buff

XDP and sk_buff - ebpf

I started coding in ebpf and XDP.
I am using python bcc to load the XDP program to the NICs.
I am trying to work with __sk_buff structure, but when I am trying to access any filed of skb the verifier failed to load the program.
int xdp_test(struct sk_buff *skb)
{
void *data = (void*)(long)skb->data;
void *data_end = (void*)(long)skb->data_end;
if (data + sizeof(struct ethhdr) + sizeof(struct iphdr) < data_end)
{
struct iphdr * ip = ip_hdr(skb);
// according to my checks, it failed because of this line. I cant access to ip->protocol (or any other fileds)
if (ip->protocol == IPPROTO_TCP)
{
return XDP_PASS;
}
}
...
return XDP_PASS
}
I just want to calculates layer 4 checksum on my program using bpf_l4_csum_replace Which takes skb as the first argument.
Why is that happening?
Can I even use __sk_buff structure in XDP? Or I have to use the xdp_md struct?
UPDATE:
Thanks to Qeole, I understood that I cannot use sk_buff using XDP.
There is a way to calculate TCP checksum using xdp_md?

Indeed you cannot use the struct __sk_buff in XDP programs. You have to use the struct xdp_md instead. XDP performance is due for a great part to the kernel calling the eBPF program before the allocation and initialisation of the socket buffer (struct sk_buff in the kernel), which saves time and resources, but also means you don't have access to that structure in your XDP program.

Related

BPF program using XDP returns failed to load BPF skeleton (-22)

Firstly, I was not using the libbpf API directly, neither BCC. Instead, I was trying to use the API of the skeleton generated by bpftool.
Control code:
obj_gen = bpf_xdp_c__open();
if (!obj_gen)
goto cleanup;
ifindex = if_nametoindex("eth0");
if(!ifindex)
{
perror("if_nametoindex");
return 1;
}
err = bpf_xdp_c__load(obj_gen);
BPF code:
// Simple XDP BPF program. Everything packet will be dropped.
SEC("test")
int xdp_prog1(struct xdp_md *ctx){
char drop_message[] = "XDP PACKET DROP\n";
bpf_trace_printk(&drop_message, sizeof(drop_message));
return XDP_DROP;
}
So, after running, the error below was shown:
// libbpf: load bpf program failed: Invalid argument
// libbpf: failed to load program 'test'
// libbpf: failed to load object 'bpf_xdp_c'
// libbpf: failed to load BPF skeleton 'bpf_xdp_c': -22
After debugging, I have noticed that the program type was not with the correct value. It was always returning 0. So, I had to define the following code before the load call:
obj_gen->progs.xdp_prog1->type = BPF_PROG_TYPE_XDP;
Because the struct bpf_program was in the libbpf.c, I had to redefine in my header in in order to the compile find it. This workaround worked.
Q: Is there a better solution?

I found what it was wrong. After looking at libbpf source coude, I found the variable
static const struct bpf_sec_def section_defs[] = {... BPF_PROG_SEC("xdp",BPF_PROG_TYPE_XDP), ...
So, I noticed that I have not defined my XDP section in my BPF program with SEC("xdp").
Thanks #Qeole for your help.

I have an error trying to access iphdr using eBPF

So I've been trying to access the iphdr using eBPF.
static inline int parse_ipv4(void *data, u64 nh_off, void *data_end) {
struct iphdr *iph = data + nh_off;
if ((void*)&iph[1] > data_end)
return 0;
return iph->protocol;
}
When I use the code above in the eBPF function, it works fine like :
if (h_proto == htons(ETH_P_IP)){
index = parse_ipv4(data, nh_off, data_end);
Like this, calling parse_ipv4 function works.
However, if I try to access the ipheader directly without using the function, it doesn't work.
if (h_proto == htons(ETH_P_IP)){
index = parse_ipv4(data, nh_off, data_end);
struct iphdr *iph2 = sizeof(*eth) + nh_off;
}
This gives me an error : HINT: The invalid mem access 'inv' error can happen if you try to dereference memory without first using bpf_probe_read() to copy it to the BPF stack. Sometimes the bpf_probe_read is automatic by the bcc rewriter, other times you'll need to be explicit.
and fails to activate.
Thank you so much in advance!

Unless I misunderstand your program, the following:
struct iphdr *iph2 = sizeof(*eth) + nh_off;
looks erroneous. Instead, iph2 should be something like data + nh_off, just as in your function, no? If you set it to the sum of two sizes, without any base address, then you try to access data at an arbitrary memory location (something like 0x28 I guess), which of course is not permitted.

how to read data from quectel L89 GPS module in stm32 using HAL_UART_Receive()?

I am using STM32F103C8T6 board and CubeMX to generate the code. I need to receive the GPS data from Quectel L89 module from UART2 port. when I try that I get some junk values only... I am using HAL_UART_Receive to receive data and print it in the putty console. Any help would be greatly appreciated.
This is my code.
void task1(void)
{
char *buffer = NULL;
buffer = (char*)malloc(400 * sizeof(char));
while(1)
{
HAL_UART_Receive(&huart2,buffer,350,500);
int size = strlen(buffer);
HAL_UART_Transmit(&huart1,buffer,size,500);
HAL_Delay(1000);
}
}
Image of the Result

try this
HAL_UART_Receive(&huart2,(uint8_t *)buffer,350,500);
and
HAL_UART_Transmit(&huart1,(uint8_t *)buffer,size,500);
Because arguments needed for HAL functions are of uint8_t * type.

Extract frames from pcap files (tcpdump output) without using Libraries

I need to parse the pcap files and count the packets separately (TCP,UDP,IP). I found a lot of libraries for this like pcap, jnetpcap but I want to do this without using any external libraries.I do not need a code but a just a conceptual explanation.
Question
While parsing pcap files how should I distinguish between the frames(be it TCP,UDP,IP). I tried reading about the format but what I do not understand is how would I come to know about how many bytes should I read for a particular frame and how would i know what type of a frame is it.Because only once I am able to extract the packets separately I will be able to filter out other information.

You'd have to parse each frame separately and have a counter for each value you are trying to count. Assuming the capture you are examining is in pcap/pcapng format you might find libpcap helpful.
To give a quick run of what you might have to do (assuming the lower level is Ethernet without VLAN tags)
uint64_t ip_count, tcp_count, udp_count;
void parse_pkt(uint8_t *data, uint32_t data_len) {
uint8_t *ether_hdr = data;
uint16_t ether_type = ntohs(*(uint16_t *) (data + 12))
if (ether_type != 0x800) {
return;
}
ip_count += 1;
uint8_t *ip_hdr = data + 14;
protocol = ntohs(*(uint16_t *) (ip_hdr + 9))
//protocol is either udp/tcp/sctp...etc
if (protocol == 0x11) {
udp_count++;
} else if (protocol == 0x06) {
tcp_count++;
}
}
// foreach pkt from libpcap_open call parse_pkt with the data and data_len
This code is fragile. Jumping to direct offsets without the proper length and type checks is not a good idea.

Is select() + non-blocking write() possible on a blocking pipe or socket?

The situation is that I have a blocking pipe or socket fd to which I want to write() without blocking, so I do a select() first, but that still doesn't guarantee that write() will not block.
Here is the data I have gathered. Even if select() indicates that
writing is possible, writing more than PIPE_BUF bytes can block.
However, writing at most PIPE_BUF bytes doesn't seem to block in
practice, but it is not mandated by the POSIX spec.
That only specifies atomic behavior. Python(!) documentation states that:
Files reported as ready for writing by select(), poll() or similar
interfaces in this module are guaranteed to not block on a write of up
to PIPE_BUF bytes. This value is guaranteed by POSIX to be at least
512.
In the following test program, set BUF_BYTES to say 100000 to block in
write() on Linux, FreeBSD or Solaris following a successful select. I
assume that named pipes have similar behavior to anonymous pipes.
Unfortunately the same can happen with blocking sockets. Call
test_socket() in main() and use a largish BUF_BYTES (100000 is good
here too). It's unclear whether there is a safe buffer size like
PIPE_BUF for sockets.
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <errno.h>
#include <string.h>
#include <sys/types.h>
#include <limits.h>
#include <stdio.h>
#include <sys/select.h>
#include <unistd.h>
#define BUF_BYTES PIPE_BUF
char buf[BUF_BYTES];
int
probe_with_select(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds)
{
struct timeval timeout = {0, 0};
int n_found = select(nfds, readfds, writefds, exceptfds, &timeout);
if (n_found == -1) {
perror("select");
}
return n_found;
}
void
check_if_readable(int fd)
{
fd_set fdset;
FD_ZERO(&fdset);
FD_SET(fd, &fdset);
printf("select() for read on fd %d returned %d\n",
fd, probe_with_select(fd + 1, &fdset, 0, 0));
}
void
check_if_writable(int fd)
{
fd_set fdset;
FD_ZERO(&fdset);
FD_SET(fd, &fdset);
int n_found = probe_with_select(fd + 1, 0, &fdset, 0);
printf("select() for write on fd %d returned %d\n", fd, n_found);
/* if (n_found == 0) { */
/* printf("sleeping\n"); */
/* sleep(2); */
/* int n_found = probe_with_select(fd + 1, 0, &fdset, 0); */
/* printf("retried select() for write on fd %d returned %d\n", */
/* fd, n_found); */
/* } */
}
void
test_pipe(void)
{
int pipe_fds[2];
size_t written;
int i;
if (pipe(pipe_fds)) {
perror("pipe failed");
_exit(1);
}
printf("read side pipe fd: %d\n", pipe_fds[0]);
printf("write side pipe fd: %d\n", pipe_fds[1]);
for (i = 0; ; i++) {
printf("i = %d\n", i);
check_if_readable(pipe_fds[0]);
check_if_writable(pipe_fds[1]);
written = write(pipe_fds[1], buf, BUF_BYTES);
if (written == -1) {
perror("write");
_exit(-1);
}
printf("written %d bytes\n", written);
}
}
void
serve()
{
int listenfd = 0, connfd = 0;
struct sockaddr_in serv_addr;
listenfd = socket(AF_INET, SOCK_STREAM, 0);
memset(&serv_addr, '0', sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_addr.s_addr = htonl(INADDR_ANY);
serv_addr.sin_port = htons(5000);
bind(listenfd, (struct sockaddr*)&serv_addr, sizeof(serv_addr));
listen(listenfd, 10);
connfd = accept(listenfd, (struct sockaddr*)NULL, NULL);
sleep(10);
}
int
connect_to_server()
{
int sockfd = 0, n = 0;
struct sockaddr_in serv_addr;
if((sockfd = socket(AF_INET, SOCK_STREAM, 0)) < 0) {
perror("socket");
exit(-1);
}
memset(&serv_addr, '0', sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_port = htons(5000);
if(inet_pton(AF_INET, "127.0.0.1", &serv_addr.sin_addr) <= 0) {
perror("inet_pton");
exit(-1);
}
if (connect(sockfd, (struct sockaddr *)&serv_addr, sizeof(serv_addr)) < 0) {
perror("connect");
exit(-1);
}
return sockfd;
}
void
test_socket(void)
{
if (fork() == 0) {
serve();
} else {
int fd;
int i;
int written;
sleep(1);
fd = connect_to_server();
for (i = 0; ; i++) {
printf("i = %d\n", i);
check_if_readable(fd);
check_if_writable(fd);
written = write(fd, buf, BUF_BYTES);
if (written == -1) {
perror("write");
_exit(-1);
}
printf("written %d bytes\n", written);
}
}
}
int
main(void)
{
test_pipe();
/* test_socket(); */
}

Unless you wish to send one byte at a time whenever select() says the fd is ready for writes, there is really no way to know how much you will be able to send and even then it is theoretically possible (at least in the documentation, if not in the real world) for select to say it's ready for writes and then the condition to change in the time between select() and write().
Non blocking sends are the solution here and you don't need to change your file descriptor to non blocking mode to send one message in non-blocking form if you change from using write() to send(). The only thing you need to change is to add the MSG_DONTWAIT flag to the send call and that will make the one send non-blocking without altering your socket's properties. You don't even need to use select() at all in this case either since the send() call will give you all the information you need in the return code - if you get a return code of -1 and the errno is EAGAIN or EWOULDBLOCK then you know you can't send any more.

The Posix section you cite clearly states:
[for pipes] If the O_NONBLOCK flag is clear, a write request may cause the thread to block, but on normal completion it shall return nbyte.
[for streams, which presumably includes streaming sockets] If O_NONBLOCK is clear, and the STREAM cannot accept data (the STREAM write queue is full due to internal flow control conditions), write() shall block until data can be accepted.
The Python documentation you quoted can therefore only apply to non-blocking mode only. But as you're not using Python it has no relevance anyway.

The answer by ckolivas is the correct one but, having read this post, I thought I could add some test data for interest's sake.
I quickly wrote a slow reading tcp server (sleeping 100ms between reads) which did a read of 4KB on each cycle. Then a fast writing client which I used for testing various scenarios on write. Both were using select before read (server) or write (client).
This was on Linux Mint 18 running under a Windows 7 VM (VirtualBox) with 1GB of memory assigned.
For the blocking case
If a write of a "certain number of bytes" became possible, select returned and the write either completed in total immediately or blocked until it completed. On my system, this "certain number of bytes" was at least 1MB. On the OP's system, this was clearly much less (less than 100,000).
So select did not return until a write of at least 1MB was possible. There was never a case (that I saw) where select would return if a smaller write would subsequently block. Thus select + write(x) where x was 4K or 8K or 128K never write blocked on this system.
This is all very well of course but this was an unloaded VM with 1GB of memory. Other systems would be expected to be different. However, I would expect that writes below a certain magic number (PIPE_BUF perhaps), issued subsequent to a select, would never block on all POSIX compliant systems. However (again) I don't see any documentation to that effect so one can't rely on that behaviour (even though the Python documentation clearly does). As the OP says, it's unclear whether there is a safe buffer size like PIPE_BUF for sockets. Which is a pity.
Which is what ckolivas' post says even though I'd argue that no rational system would return from a select when only a single byte was available!
Extra information:
At no point (in normal operation) did write return anything other than the full amount requested (or an error).
If the server was killed (ctrl-c), the client side write would immediately return a value (usually less than was requested - no normal operation!) with no other indication of error. The next select call would return immediately and the subsequent write would return -1 with errno saying "Connection reset by peer". Which is what one would expect - write as much as you can this time, fail the next time.
This (and EINTR) appears to be the only time write returns a number > 0 but less than requested.
If the server side was reading and the client was killed, the server continued to read all available data until it ran out. Then it read a zero and closed the socket.
For the non-blocking case:
The behaviour below some magic value is the same as above. select returns, write doesn't block (of course) and the write completes in its totality.
My issue was what happens otherwise. The send(2) man page says that in non-blocking mode, send fails with EAGAIN or EWOULDBLOCK. Which might imply (depending on how you read it) that it's all or nothing. Except that it also says select may be used to determine when it is possible to send more data. So it can't be all or nothing.
Write (which is the same as send with no flags), says it can return less than requested. This nitpicking seems pedantic but the man pages are the gospel so I read them as such.
In testing, a non-blocking write with a value larger than some particular value returned less than requested. This value wasn't constant, it changed from write to write but it was always pretty large (> 1 to 2MB).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

XDP and sk_buff - ebpf

Related

BPF program using XDP returns failed to load BPF skeleton (-22)

I have an error trying to access iphdr using eBPF

how to read data from quectel L89 GPS module in stm32 using HAL_UART_Receive()?

Extract frames from pcap files (tcpdump output) without using Libraries

Is select() + non-blocking write() possible on a blocking pipe or socket?

Categories

Resources