eBPF: running in Linux namespaces - bpf

So BPF programs are the kernel entities, as they run within the kernel space. On the other hand, Linux namespaces aka containers, provide application-level isolation, in which case they all share the host's kernel, kernel modules etc.
So I guess it doesn't make sense to load a bpf program per container, as it will become visible on the host as well?
Therefore I would guess that bpf programs would load on the host and monitor/mangle/etc. packets to/from namespaces. Given that struct sock has the information about namespace id, I think only certain types of bpf programs would be able to do that?

So I guess it doesn't make sense to load a bpf program per container, as it will become visible on the host as well?
If you mean loading a BPF program from a container, then yes it would be visible on the host as well (and you would need a privileged container in order to do that).
Given that struct sock has the information about namespace id, I think only certain types of bpf programs would be able to do that?
I couldn't find any BPF program type that has direct access to struct sock. BPF programs of type sockops have access to struct bpf_sock, but it contains few actual information.
You could use BPF programs of type cgroup/skb though. Those are attached to cgroups and can act on both ingress and egress packets. They receive a struct __sk_buff object as argument, a mirror of the sk_buff for the packet received/sent. They can only use a few helpers (in addition to the common base), and don't seem to have write access to packets.
kprobe BPF programs have access to any kernel function kprobes can be attached to. So you could retrieve the namespace information by probing the appropriate function and then send it to your monitor/mangle/etc. program through a bpf map. Not the simplest option though.

Related

ebpf-tc: how to keep unique information inside a ebpf instance when same program is attached to multiple interface

When we pin a MAP, we can able to share information from userspace to ebpf but it is system wide.
But if i want to share different value to different instance from tc ingress/egress (array map with size of 1)
Is there any way to pass argument ?
Map (unpinned unique per instance) - update from userspace
Any other way to communicate from userspace to kernel (while attaching or after)
Really appreciate your help.
Pinning a map doesn't make it system wide. Every map is always accessible system wide, pinning just adds a reference to the file system to make it easier to find and to make sure a map isn't removed even when not in use by any program.
Is there any way to pass argument ?
No, once a program is loaded, only the kernel can pass arguments(contexts) to a program, userspace can only use maps to communicate with eBPF programs.
Map (unpinned unique per instance) - update from userspace
Any userspace program with the right permissions can update any map as long as you can obtain a file descriptor to the map. Map FDs can be obtained:
By creating a new map
By opening a map pin
By using its unique ID (directly or by looping over all loaded maps)
By IPC from another process that already has the FD
Any other way to communicate from userspace to kernel (while attaching or after)
Maps are it. You can rewrite the program before loading it into the kernel by setting constants at specific locations, but not at attach time. One way which might be interesting is that on newer kernels, global data is supported. Which allow you to change the value of variables defined in the global scope from userspace. In this case the global data is packed in a array map with a single key/value.

share information between function(BPF/XDP)

Objective: If process id/name = xxx then drop the packet
So, I am bit confused. So far I know you can't extract process information from XDP but bpf trace allows you to trace it.
Here's my probable solution, use bpf hash maps to share information between two function. If process name == xx then XDP_DROP. (This maybe wrong, but something I was trying)
But I am confused how to use BPF_HASHMAPS, I read the documentation on bcc yet..
Example: From this hello function I can trace events
struct data_t {
u32 pid;
u64 ts;
char comm[TASK_COMM_LEN];
};
BPF_PERF_OUTPUT(events);
int hello(struct pt_regs *ctx) {
struct data_t data = {};
data.pid = bpf_get_current_pid_tgid();
data.ts = bpf_ktime_get_ns();
bpf_get_current_comm(&data.comm, sizeof(data.comm));
events.perf_submit(ctx, &data, sizeof(data));
return 0;
}
XDP function to drop packer
int udpfilter(struct xdp_md *ctx) {
bpf_trace_printk("got a packet\n");
//u32 cpu = bpf_get_smp_processor_id();
//bpf_trace_printk("%s looking\n",cpu);
//u32 pid = bpf_get_current_pid_tgid();
return XDP_DROP;
}
Now how do I fetch pid value and use it in XDP function, plus does the solution even makes any sense. Thanks for the help, really appreciated.
So, as you know eBPF programs can be loaded into the kernel at different locations. XDP programs are loaded just after the network driver and just before the network stack. At this point the kernel doesn't know for which process a packet might be since it will figure all of that out in the network stack.
The hello program you are showing is an example of a kprobe(kernel probe). It attaches to whatever kernel function you specify, but it is a tracing tool, can't make changes.
Also, some helper functions like bpf_get_current_pid_tgid are program type dependent. bpf_get_current_pid_tgid only works in kprobes, uprobes, tracepoint programs (perf programs), the may actually also work in socket and cGroup programs, the issue is that there is not a very clear list or overview of which work where, these are two good but non-comprehensive links:
https://blogs.oracle.com/linux/post/bpf-in-depth-bpf-helper-functions
https://github.com/iovisor/bcc/blob/master/docs/kernel-versions.md#program-types
In the end it comes down to logic. The kernel can only give you access to data and actions it has access to itself. So if you want to do network related things based on process ID's you might need to use an eBPF program attached at a location where such info is available(keep in mind that this is obviously also slower).
So depending on what exactly you want to do you have a few options:
Attach an eBPF program to a network socket(BPF_PROG_TYPE_SOCKET_FILTER) so you can filter packets on the socket level. This does require the program that creates the socket to attach the program to it.
Use a cGroup and BPF_PROG_TYPE_CGROUP_SKB program to block packets. Since you attach the program to the cGroup, this doesn't require cooperation from the program.
Use an TC program(BPF_PROG_TYPE_SCHED_ACT), on this level a packet is already parsed, but you still need to match it to a process
Use an XDP program(BPF_PROG_TYPE_XDP) can still be used, this does require you to parse all layers of the network packet(Ethernet, VLAN, IP, UDP/TCP), and then manually extract the protocol, Destination IP, and Destination port. Just like in the TC program you then need to match it to an pid using a lookup table.
When going the XDP or TC route you need to create this lookup table. As far as I know you can't access the table of the kernel via helper functions. A few approaches are:
parsing the output of netstat -lpn(protocol, destination ip, destination port and PID) and setting the data in a map to be used by a program
Getting the same data but directly from /sys or /proc(I don't know where the data is stored exactly)
Recording which PIDs have which sockets during creation(using a second program(kprobe/tracepoint)) and setting this data in a map shared by both the XDP/TC program and the trace program. (not quite sure how to share maps between programs in BCC, but it is certainly possible when using libbpf)

How to share a ebpf map between interfaces

Is it possible to share an ebpf Map between two network interfaces.
I want to write an XDP program and hook it on two devices namely eth0 and eth1. The implementation requires that they both use the same map.
Is it possible to load the same program, hooking them at eth0 and eth1 and use the same Map.
Thank you all!
Yes, this is entirely possible. An eBPF map is not attached to an interface, it is created in the kernel and then referenced by one or several of:
A file descriptor, from the application that created the map,
An eBPF program that uses the map,
A reference in another map of a dedicated type (array-of-maps, hash-of-maps),
A pinned path in the eBPF virtual file system.
eBPF maps can be shared between consecutive program runs, between kernel and user space, or - as for your use case - between different eBPF programs, no matter what interfaces they are attached to. Note that a given program could even be detached and later re-attached to another interface, anyway.
For reusing a map for several programs, this is done by pointing to the same map when you load your programs: What happens when you load one is that you get a handle to the map (its id or its pinned path, for example), get a file descriptor to the map from that handle, and place this file descriptor in your eBPF bytecode before loading it. Then the kernel translates the file descriptor into the relevant memory address.
What happens most of the time in practice is that this “relocation” step (placing the file descriptor in the bytecode) is handled for you by the framework you are using to load your programs, such as libbpf or bcc tools. For example, libbpf has a bpf_map__reuse_fd(struct bpf_map *map, int fd) function to explicitly reuse a given file descriptor for a specific map used by a program, after parsing an object file to extract the bytecode but before loading it.

What's the difference between endpoint and socket?

Almost every definition of socket that I've seen, relates it very closely to the term endpoint:
wikipedia:
A network socket is an internal endpoint for sending or receiving data
at a single node in a computer network. Concretely, it is a
representation of this endpoint in networking software
This answer:
a socket is an endpoint in a (bidirectional) communication
Oracle's definition:
A socket is one endpoint of a two-way communication link between two
programs running on the network
Even stackoverflow's definition of the tag 'sockets' is:
An endpoint of a bidirectional inter-process communication flow
This other answer goes a bit further:
A TCP socket is an endpoint instance
Although I don't understand what "instance" means in this case. If an endpoint is, according to this answer, a URL, I don't see how that can be instantiated.
"Endpoint" is a general term, including pipes, interfaces, nodes and such, while "socket" is a specific term in networking.
IMHO - logically (emphasis added) "socket" and "endpoint" are same, because they both are concatenation of an Internet Address with a TCP port. Strictly technically speaking in core-networking, there is nothing like "endpoint", there is only "socket". Go on, read more below...
As #Zac67 highligted, "socket" is a very specific term in networking - if you read TCP RFC (https://www.rfc-editor.org/rfc/rfc793) then you won't find even a single reference of "endpoint", it only talks about "socket". But when you come out of RFC world, you will hear a lot about "endpoint".
Now, they both talk about combination of IP address and a TCP port, but you can't say someone that "please give me socket of your application", you will say "please give me endpoint of your application". So, IMHO the way someone can understand difference between Socket and Endpoint is - even though both refer to combination of IP address and TCP port, but you use term "socket" when you are talking in context of computer processes or in context of OS, otherwise when talking with someone in general you will use "endpoint".
I am a guy coming from embedded systems world and low level things,
Endpoint is a hardware buffer constructed at the far end of your machine, what does that mean?
YourMachine <---------------> Device
[Socket] ----------------> [Endpoint]
[Endpoint] <---------------- [Socket]
Both sockets and endpoints are endpoints but socket is an endpoint that resides on the sender which here your machine[Socket is a word used to distinguish between sender and receiver]
OK, now that we know it is a buffer, what is the relation between buffers and networking?
Windows
When you create a socket on Windows, the OS returns a handle to that socket, in fact socket is actually a kernel object, so in Windows when you create a kernel object the returned value is a handle which is used to access that object, usually handles are void* which is then casted into numerical value that Windows can understand, now that you have access to socket kernel object, all IO operations are handled in the OS kernel and since you want to communicate with external device then you have to reach kernel first and the socket is exactly doing this, in other words, socket creates a socket object in the kernel = creates an endpoint in the kernel = creates a buffer in the kernel, that buffer is used to stream data through wires later on using OS HAL(Hardware abstraction layer) and you can talk to other devices and you are happy
Now, if the other device doesn't have communication buffer = endpoint, then you can't communicate with it, even if you open a socket on your end, it has to be two way data communication = Send and Receive
Another example of accessing IO peripheral is accessing RAM (Main memory), two ways of accessing RAM, either you access process stack or access process heap, the stack is not a kernel object in fact you can access stack directly without reaching OS kernel, simply by subtracting a value from RSP(Stack pointer register), example:
; This example demonstrates how to allocate 32 contiguous bytes from stack on Windows OS
; Intel syntax
StackAllocate proc
sub rsp, 20h
ret
StackAllocate endp
Accessing heap is different, the heap is a kernel object, so when you call malloc()/new operator in your code a long call stack is called through windows code, the point is reaching RAM requires kernel help, the stack allocation above is actually not reaching RAM, all I did is subtracting a number of an existing value in RSP which is inside CPU so I did not go outside, the heap object in kernel returns a handle that Windows use to manage fragmented memory and in the end returns a void* to that memory
Hope that helped

Device Drivers vs the /dev + glibc Interface

I am looking to have the processor read from I2C and store the data in DDR in an embedded system. As I have been looking at solutions, I have been introduced to Linux device drivers as well as the GNU C Library. It seems like for many operations you can perform with the basic Linux drivers you can also perform with basic glibc system calls. I am somewhat confused when one should be used over the other. Both interfaces can be accessed from the user space.
When should I use a kernel driver to access a device like I2C or USB and when should I use the GNU C Library system functions?
The GNU C Library forwards function calls such as read, write, ioctl directly to the kernel. These functions are just very thin wrappers around the system calls. You could call the kernel all by yourself using inline assembly, but that is rarely helpful. So in this sense, all interactions with the kernel driver will go through these glibc functions.
If you have questions about specific interfaces and their trade-offs, you need to name them explicitly.
In ARM:
Privilege states are built into the processor and are changed via assembly commands. A memory protection unit, a part of the chip, is configured to disallow access to arbitrary ranges of memory depending on the privilege status.
In the case of the Linux kernel, ALL physical memory is privileged- memory addresses in userspace are virtual (fake) addresses, translated to real addresses once in privileged mode.
So, to access a privileged memory range, the mechanics are like a function call- you set the parameters indicating what you want, and then make a ('SVC')- an interrupt function which removes control of the program from userspace, gives it to the kernel. The kernel looks at your parameters and does what you need.
The standard library basically makes that whole process easier.
Drivers create interfaces to physical memory addresses and provide an API through the SVC call and whatever 'arguments' it's passed.
If physical memory is not reserved by a driver, the kernel generally won't allow anyone to access it.
Accessing physical memory you're not privileged to will cause a "bus error".
BTW: You can use a driver like UIO to put physical memory into userspace.