Sending UDP packet in Linux Kernel - sockets

For a project, I'm trying to send UDP packets from Linux kernel-space. I'm currently 'hard-coding' my code into the kernel (which I appreciate isn't the best/neatest way) but I'm trying to get a simple test to work (sending "TEST"). It should be mentioned I'm a newbie to kernel hacking - I'm not that clued up on many principles and techniques!
Every time my code gets run the system hangs and I have to reboot - no mouse/keyboard response and the scroll and caps lock key lights flash together - I'm not sure what this means, but I'm assuming it's a kernel panic?
The repeat_send code is unnecessary for this test code, yet when it's working I want to send large messages that may require multiple 'send's - I'm not sure that if could be a cause of my issues?
N.B. This code is being inserted into neighbour.c of linux-source/net/core/ origin, hence the use of NEIGH_PRINTK1, it's just a macro wrapper round printk.
I'm really banging my head against a brick wall here, I can't spot anything obvious, can anyone point me in the right direction (or spot that blindingly obvious error!)?
Here's what I have so far:
void mymethod()
{
struct socket sock;
struct sockaddr_in addr_in;
int ret_val;
unsigned short port = htons(2048);
unsigned int host = in_aton("192.168.1.254");
unsigned int length = 5;
char *buf = "TEST\0";
struct msghdr msg;
struct iovec iov;
int len = 0, written = 0, left = length;
mm_segment_t oldmm;
NEIGH_PRINTK1("forwarding sk_buff at: %p.\n", skb);
if ((ret_val = sock_create(PF_INET, SOCK_DGRAM, IPPROTO_UDP, &sock)) < 0) {
NEIGH_PRINTK1("Error during creation of socket; terminating. code: %d\n", ret_val);
return;
}
memset(&addr_in, 0, sizeof(struct sockaddr_in));
addr_in.sin_family=AF_INET;
addr_in.sin_port = port;
addr_in.sin_addr.s_addr = host;
if((ret_val = sock.ops->bind(&sock, (struct sockaddr *)&addr_in, sizeof(struct sockaddr_in))) < 0) {
NEIGH_PRINTK1("Error trying to bind socket. code: %d\n", ret_val);
goto close;
}
memset(&msg, 0, sizeof(struct msghdr));
msg.msg_flags = 0;
msg.msg_name = &addr_in;
msg.msg_namelen = sizeof(struct sockaddr_in);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_control = NULL;
msg.msg_controllen = 0;
repeat_send:
msg.msg_iov->iov_len = left;
msg.msg_iov->iov_base = (char *)buf + written;
oldmm = get_fs();
set_fs(KERNEL_DS);
len = sock_sendmsg(&sock, &msg, left);
set_fs(oldmm);
if (len == -ERESTARTSYS)
goto repeat_send;
if (len > 0) {
written += len;
left -= len;
if (left)
goto repeat_send;
}
close:
sock_release(&sock);
}
Any help would be hugely appreciated, thanks!

You may find it easier to use the netpoll API for UDP. Take a look at netconsole for an example of how it's used. The APIs you're using are more intended for userspace (you should never have to play with segment descriptors to send network data!)

Run your code when you're in a text mode console (i.e. press Ctrl+Alt+F1 to go to the text console). This way a kernel panic will print out the stack trace and any extra information about what went wrong.
If that doesn't help you, update your question with the stack trace.

I'm not much of a Linux Kernel developer, but can you throw some printk's in there and watch dmesg before it goes down? Or have you thought about hooking up with a kernel debugger?

I think you should try to put all variables outside mymethod() function and make them static. Remember, that the size of kernel stack is limited do 8KiB, so to much of/too big local variables may cause stack overflow and system hangup.

Related

Zerocopy TCP message sending from kernel module error EFAULT

I am working on a kernel module which receives data over DMA from an FPGA and stores it in a ring buffer allocated with dma_alloc_attrs(dev, size, &data->dma_addr, GFP_KERNEL, DMA_ATTR_FORCE_CONTIGUOUS). Everytime when new data is available in the ring buffer, a completion is fired.
In the same kernel module, I am running a TCP server and during the lifetime of the kernel module only one client (on a different machine) connects to the server(and stays connected). A separate thread in the kernel module sends data received in the ring buffer to the connected client whenever the completion was fired. The idea behind having a tcp server in the kernel space is to get rid of the unnecessary context switches from kernel space and user space whenever the data should be sent to the client, thus increasing performance. So far everything works, but the performance isn't as expected (on the TCP side).
After looking a bit into how to increase performance, i found the ZEROCOPY option.
I changed the settings of the server socket to set the SO_ZEROCOPY flag: kernel_setsockopt(socket, SOL_SOCKET, SO_ZEROCOPY, (char *)&one, sizeof(one)) and the implementation of the sending to client to:
static DEFINE_MUTEX(tcp_send_mtx);
static int send(struct socket *sock, const char *buf,
const size_t length, unsigned long flags)
{
struct msghdr msg;
struct kvec vec;
int len, written = 0;
int left = length;
if(sock == NULL)
{
printk(KERN_ERR MODULE_NAME ": tcp server send socket is NULL\n");
return -EFAULT;
}
msg.msg_name = 0;
msg.msg_namelen = 0;
msg.msg_control = NULL;
msg.msg_controllen = 0;
msg.msg_flags = MSG_ZEROCOPY;
repeat_send:
vec.iov_len = left;
vec.iov_base = (char *)buf + written;
len = kernel_sendmsg(sock, &msg, &vec, left, left);
if((len == -ERESTARTSYS) || (!(flags & MSG_DONTWAIT) && (len == -EAGAIN)))
goto repeat_send;
if(len > 0)
{
written += len;
left -= len;
if(left)
goto repeat_send;
}
return written?written:len;
}
Note the msg.msg_flags = MSG_ZEROCOPY; assignment in the send function.
Now when i am trying to use this, I am getting EFAULT(-14) error code from kernel_sendmsg just by adding the MSG_ZEROCOPY flag.
UPDATE:
I understand now that the ZEROCOPY flag is wrongly used in the kernel space since it's designed to remove the additional copy between the user-space and kernel-space.
My initial problem still exists. TCP transfer is still slow and the ring buffer overflows when the DMA transfer speed exceeds 120mb/s. The thread that forwards the messages to the client is not able to send the 8kb messages faster than 120mb/s.
Anyone knows what is wrong here? Maybe that the idea is wrong in the first place

EBPF Newbie: Need Help, facing an error while loading a EBF code

I wrote a bpf code and compiled with clang, while trying to load, I face an error. I am not able to understand why and how to resolve it, need experts advice.
I am running this code in a VM
OS : Ubuntu 18.04.2
Kernel : Linux 4.18.0-15-generic x86_64
I tried simple programs and I able to load but not with this program.
static __inline int clone_netflow_record (struct __sk_buff *skb, unsigned long dstIpAddr)
{
return XDP_PASS;
}
static __inline int process_netflow_records( struct __sk_buff *skb)
{
int i = 0;
#pragma clang loop unroll(full)
for (i = 0; i < MAX_REPLICATIONS; i++) {
clone_netflow_record (skb, ipAddr[i]);
}
return XDP_DROP;
}
__section("action")
static int probe_packets(struct __sk_buff *skb)
{
/* We will access all data through pointers to structs */
void *data = (void *)(long)skb->data;
void *data_end = (void *)(long)skb->data_end;
if (data > data_end)
return XDP_DROP;
/* for easy access we re-use the Kernel's struct definitions */
struct ethhdr *eth = data;
struct iphdr *ip = (data + sizeof(struct ethhdr));
/* Only actual IP packets are allowed */
if (eth->h_proto != __constant_htons(ETH_P_IP))
return XDP_DROP;
/* If Netflow packets process it */
if (ip->protocol != IPPROTO_ICMP)
{
process_netflow_records (skb);
}
return XDP_PASS;
}
ERROR Seen:
$ sudo ip link set dev enp0s8 xdp object clone.o sec action
Prog section 'action' rejected: Permission denied (13)!
- Type: 6
- Instructions: 41 (0 over limit)
- License: GPL
Verifier analysis:
0: (bf) r2 = r1
1: (7b) *(u64 *)(r10 -16) = r1
2: (79) r1 = *(u64 *)(r10 -16)
3: (61) r1 = *(u32 *)(r1 +76)
invalid bpf_context access off=76 size=4
Error fetching program/map!
The kernel verifier that enforces checks on your program in the Linux kernel ensures that no out-of-bound accesses are attempted. Your program is rejected because it may trigger such out-of-bound access.
If we have a closer look at your snippet:
void *data = (void *)(long)skb->data;
void *data_end = (void *)(long)skb->data_end;
So here we get pointers to data (start of packet) and data_end.
if (data > data_end)
return XDP_DROP;
The above check is unnecessary (data will not be higher than data_end). But there's another check you should do here instead. Let's see below:
/* for easy access we re-use the Kernel's struct definitions */
struct ethhdr *eth = data;
struct iphdr *ip = (data + sizeof(struct ethhdr));
/* Only actual IP packets are allowed */
if (eth->h_proto != __constant_htons(ETH_P_IP))
return XDP_DROP;
What you do here is, first, making eth and ip point to the start of the packet and (supposedly) the start of the IP header. This step is fine. But then, you try to dereference eth to access its h_proto field.
Now, what would happen if the packet was not Ethernet, and it was not long enough to have an h_proto field in it? You would try to read some data outside of the bounds of the packet, this is the out-of-bound access I mentioned earlier. Note that it does not mean your program actually tried to read this data (as a matter of fact, I don't see how you could get a packet shorter than 14 bytes). But from the verifier's point of view, it is technically possible that this forbidden access could occur, so it rejects your program. This is what it means with invalid bpf_context access: your code tries to access the context (for XDP: packet data) in an invalid way.
So how do we fix that? The check that you should have before trying to dereference the pointer should not be on data > data_end, it should be instead:
if (data + sizeof(struct ethhdr) > data_end)
return XDP_DROP;
So if we pass the check without returning XDP_DROP, we are sure that the packet is long enough to contain a full struct ethhdr (and hence a h_proto field).
Note that a similar check on data + sizeof(struct ethhdr) + sizeof(struct iphdr) will be necessary before trying to dereference ip, for the same reason. Each time you try to access data from the packet (the context), you should make sure that your packet is long enough to dereference the pointer safely.

Linux socket hardware timestamping

I'm working on a project researching about network synchronisation. Since I want to achieve the best performance I'm trying to compare software timestamping results with hardware timestamping ones.
I have followed this previously commented issue: Linux kernel UDP reception timestamp but after several tests I got some problems when trying to get hardware reception timestamps.
My scenario is composed of 2 devices, a PC and a Gateworks Ventana board, both devices are supposed to be waiting for packets to be broadcasted in their network and timestamping their reception times, I have tried this code (some parts have been omitted):
int rc=1;
int flags;
flags = SOF_TIMESTAMPING_RX_HARDWARE
| SOF_TIMESTAMPING_RAW_HARDWARE;
rc = setsockopt(sock, SOL_SOCKET,SO_TIMESTAMPING, &flags, sizeof(flags));
rc = bind(sock, (struct sockaddr *) &serv_addr, sizeof(serv_addr));
struct msghdr msg;
struct iovec iov;
char pktbuf[2048];
char ctrl[CMSG_SPACE(sizeof(struct timespec))];
struct cmsghdr *cmsg = (struct cmsghdr *) &ctrl;
msg.msg_control = (char *) ctrl;
msg.msg_controllen = sizeof(ctrl);
msg.msg_name = &serv_addr;
msg.msg_namelen = sizeof(serv_addr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
iov.iov_base = pktbuf;
iov.iov_len = sizeof(pktbuf);
//struct timeval time_kernel, time_user;
//int timediff = 0;
FILE *f = fopen("server.csv", "w");
if (f == NULL) {
error("Error opening file!\n");
exit(1);
}
fprintf(f, "Time\n");
struct timespec ts;
int level, type;
int i;
for (i = 0; i < 10; i++) {
rc = recvmsg(sock, &msg, 0);
for (cmsg = CMSG_FIRSTHDR(&msg); cmsg != NULL; cmsg = CMSG_NXTHDR(&msg, cmsg))
{
level = cmsg->cmsg_level;
type = cmsg->cmsg_type;
if (SOL_SOCKET == level && SO_TIMESTAMPING == type) {
//ts = (struct timespec *) CMSG_DATA(cmsg);
memcpy(&ts, CMSG_DATA(cmsg), sizeof(ts));
printf("HW TIMESTAMP %ld.%09ld\n", (long)ts.tv_sec, (long)ts.tv_nsec);
}
}
}
printf("COMPLETED\n");
fclose(f);
close(sock);
return 0;
}
In both devices the output I get after receiving a packet:
HW TIMESTAMP 0.000000000
On the other hand if with the same code my flags are:
flags = SOF_TIMESTAMPING_RX_HARDWARE
| SOF_TIMESTAMPING_RX_SOFTWARE
| SOF_TIMESTAMPING_SOFTWARE;
I get proper timestamps:
HW TIMESTAMP 1551721801.970270543
However, they seem to be software-timestamping ones. What would be the correct solution / method to handle hardware timestamping for packets received?
First of all, use ethtool -T "your NIC" to make sure your hardware supports the hardware timestamping feature.
You need to explicitly tell the Linux to enable the hardware timestamping feature of your NIC. In order to to that, you need to have a ioctl() call.
What you have to do is to call it with SIOCSHWTSTAMP, which is a device request code to indicate which device you want to handle as well as what you want to do. For example, there is a code called CDROMSTOP to stop the cdrom drive.
You also need to use a ifreq struct to configure your NIC.
You need something like this:
struct ifreq ifconfig;
strncpy(config.ifr_name, "your NIC name", sizeof(ifconfig.ifr_name));
ioctl("your file descriptor" , SIOCSHWTSTAMP, &ifconfig);
Here are some pages that you can look up to:
ioctl manual page,
ifreq manual page,
Read part 3.

MSG_PROXY not working to provide/specify alternate addresses for transparent proxying

I'm trying to write a transparent proxy that translates arbitrary UDP packets to a custom protocol and back again. I'm trying to use transparent proxying to read the incoming UDP packets that need translation, and to write the outgoing UDP packets that have just been reverse-translated.
My setup for the socket I use for both flavors of UDP sockets is as follows:
static int
setup_clear_sock(uint16_t proxy_port)
{
struct sockaddr_in saddr;
int sock;
int val = 1;
socklen_t ttllen = sizeof(std_ttl);
sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP);
if (sock < 0)
{
perror("Failed to create clear proxy socket");
return -1;
}
if (getsockopt(sock, IPPROTO_IP, IP_TTL, &std_ttl, &ttllen) < 0)
{
perror("Failed to read IP TTL option on clear proxy socket");
return -1;
}
if (setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &val, sizeof(val)) < 0)
{
perror("Failed to set reuse address option on clear socket");
return -1;
}
if (setsockopt(sock, IPPROTO_IP, IP_TRANSPARENT, &val, sizeof(val)) < 0)
{
perror("Failed to set transparent proxy option on clear socket");
return -1;
}
saddr.sin_family = AF_INET;
saddr.sin_port = htons(proxy_port);
saddr.sin_addr.s_addr = INADDR_ANY;
if (bind(sock, (struct sockaddr *) &saddr, sizeof(saddr)) < 0)
{
perror("Failed to bind local address to clear proxy socket");
return -1;
}
return sock;
}
I have two distinct, but possibly related problems. First, when I read an incoming UDP packet from this socket, using this code:
struct sock_double_addr_in
{
__SOCKADDR_COMMON (sin_);
in_port_t sin_port_a;
struct in_addr sin_addr_a;
sa_family_t sin_family_b;
in_port_t sin_port_b;
struct in_addr sin_addr_b;
unsigned char sin_zero[sizeof(struct sockaddr) - __SOCKADDR_COMMON_SIZE - 8
- sizeof(struct in_addr) - sizeof(in_port_t)];
};
void
handle_clear_sock(void)
{
ssize_t rcvlen;
uint16_t nbo_udp_len, coded_len;
struct sockaddr_in saddr;
struct sock_double_addr_in sdaddr;
bch_coding_context_t ctx;
socklen_t addrlen = sizeof(sdaddr);
rcvlen = recvfrom(sock_clear, &clear_buf, sizeof(clear_buf),
MSG_DONTWAIT | MSG_PROXY,
(struct sockaddr *) &sdaddr, &addrlen);
if (rcvlen < 0)
{
perror("Failed to receive a packet from clear socket");
return;
}
....
I don't see a destination address come back in sdaddr. The sin_family_b, sin_addr_b, and sin_port_b fields are all zero. I've done a block memory dump of the structure in gdb, and indeed the bytes are coming back zero from the kernel (it's not a bad placement of the field in my structure definition).
Temporarily working around this by hard-coding a fixed IP address and port for testing purposes, I can debug the rest of my proxy application until I get to the point of sending an outgoing UDP packet that has just been reverse-translated. That happens with this code:
....
udp_len = ntohs(clear_buf.u16[2]);
if (udp_len + 6 > decoded_len)
fprintf(stderr, "Decoded fewer bytes (%u) than outputting in clear "
"(6 + %u)!\n", decoded_len, udp_len);
sdaddr.sin_family = AF_INET;
sdaddr.sin_port_a = clear_buf.u16[0];
sdaddr.sin_addr_a.s_addr = coded_buf.u32[4];
sdaddr.sin_family_b = AF_INET;
sdaddr.sin_port_b = clear_buf.u16[1];
sdaddr.sin_addr_b.s_addr = coded_buf.u32[3];
if (sendto(sock_clear, &(clear_buf.u16[3]), udp_len, MSG_PROXY,
(struct sockaddr *) &sdaddr, sizeof(sdaddr)) < 0)
perror("Failed to send a packet on clear socket");
}
and the packet never shows up. I've checked the entire contents of the sdaddr structure I've built, and all fields look good. The UDP payload data looks good. There's no error coming back from the sendto() syscall -- indeed, it returns zero. And the packet never shows up in wireshark.
So what's going on with my transparent proxying? How do I get this to work? (FWIW: development host is a generic x86_64 ubuntu 14.04 LTS box.) Thanks!
Alright, so I've got half an answer.
It turns out if I just use a RAW IP socket, with the IP_HDRINCL option turned on, and build the outgoing UDP packet in userspace with a full IP header, the kernel will honor the non-local source address and send the packet that way.
I'm now using a third socket, sock_output, for that purpose, and decoded UDP packets are coming out correctly. (Interesting side note: the UDP checksum field must either be zero, or the correct checksum value. Anything else causes the kernel to silently drop the packet, and you'll never see it go out. The kernel won't fill in the proper checksum for you if you zero it out, but if you specify it, it will verify that it's correct. No sending UDP with intentionally bad checksums this way.)
So the first half of the question remains: when I read a packet from sock_clear with the MSG_PROXY flag to recvfrom(), why do I not get the actual destination address in the second half of sdaddr?

Flexible socket application

I'm writing a game wich playing on LAN with socket. I use 4 bytes length prefix to know how many data in the rest like this:
void trust_recv(int sock, int length, char *buffer)
{
int recved = 0;
int justRecv;
while(recved < length) {
justRecv = recv(sock, buffer + recved, length - recved, 0);
if (justRecv < 0) return;
recved += justRecv;
}
}
void onDataArrival(int sock)
{
int length;
char *data;
trust_recv(sock, 4, (char *) &length);
data = new char[length];
trust_recv(sock, length, data);
do_somethings_with_data(data);
}
The problem is if someone (an intruder or hacker for example) sends data with other format (maybe only 2 bytes or the length of the rest lower than 4 bytes prefix value) or an network problem, my application will be go to "not responding" state and have to close (because I use blocking socket). How to make my socket application more flexible but don't swith socket to non-blocking mode to pass this issue? (Or any ideas for organize data or algorithms as well)
You can set a receive timeout, during the socket setup phase, with setsockopt() call and SO_RCVTIMEO parameter;
struct timeval tv;
tv.tv_sec =8;
tv.tv_usec = 0 ;
if (setsockopt (your_sock_fd, SOL_SOCKET, SO_RCVTIMEO, (char *)&tv, sizeof tv)
perror(“setsockopt error”);
then test the return of recv() and his errno
if (justRecv < 0)
{
if (errno == EAGAIN)
perror("TIMEOUT!");
return;
}