Register errors when trying to separate strings and print them using BCC - ebpf

I would like to separate some strings and print them out one by one but it seems almost impossible due to these errors. Trying to change const char str[] to const char *str seems to just net me an opcode 00 error. It seems like I am heading down the right path but just need help to print these lines out one by one. Here is the code running on an online compiler to see what its output looks like String Parser online IDE
from bcc import BPF
# BPF PROGRAM
bpfprogram = """
int helloworld2(void *ctx)
{
const char str[] = "here are some words";
int length = sizeof(str);
int start = 0;
//#pragma unroll Tried using this but does not really fix the issue.
for (int i = 0; i < sizeof(str); i++) {
if (str[i] == ' ') {
bpf_trace_printk("%s\\n", i - start, str + start);
start = i + 1;
}
}
bpf_trace_printk("%s\\n", length - start, str + start);
return 0;
}
"""
# This compiles the program defined by the bpfprogram string into bpf bytecode and
#loads it to the kernel BPF verifier.
b = BPF(text=bpfprogram)
# This attaches the compiled BPF program to a kernel event of your choosing,
#in this case to the sys_clone syscall which will cause the BPF program to run
#everytime the sys_clone call occurs.
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="helloworld2")
# Capture and print the BPF program's trace output
b.trace_print()
Here is the error that I am seeing, Trying a pointer instead of a char array just nets me an opcode 00 error. Trying #pragma unroll does not seem to really fix the issue either. I am wondering if there is a solution to this problem that I am just not seeing. One notable error is near the end here: R4 bitwise operator |= on pointer prohibited
bpf: Failed to load program: Permission denied
btf_vmlinux is malformed
Unrecognized arg#0 type PTR
; int helloworld2(void *ctx)
0: (b7) r1 = 7562354
; const char str[] = "here are some words";
1: (63) *(u32 *)(r10 -8) = r1
2: (18) r1 = 0x6f7720656d6f7320
4: (7b) *(u64 *)(r10 -16) = r1
5: (18) r1 = 0x6572612065726568
7: (7b) *(u64 *)(r10 -24) = r1
8: (b7) r6 = 684837
; ({ char _fmt[] = "%s\n"; bpf_trace_printk_(_fmt, sizeof(_fmt), i - start, str + start); });
9: (63) *(u32 *)(r10 -28) = r6
10: (bf) r1 = r10
;
11: (07) r1 += -28
12: (bf) r4 = r10
13: (07) r4 += -24
; ({ char _fmt[] = "%s\n"; bpf_trace_printk_(_fmt, sizeof(_fmt), i - start, str + start); });
14: (b7) r2 = 4
15: (b7) r3 = 4
16: (85) call bpf_trace_printk#6
last_idx 16 first_idx 0
regs=4 stack=0 before 15: (b7) r3 = 4
regs=4 stack=0 before 14: (b7) r2 = 4
17: (b7) r1 = 5
; if (str[i] == ' ') {
18: (71) r2 = *(u8 *)(r10 -19)
; if (str[i] == ' ') {
19: (55) if r2 != 0x20 goto pc+9
R0_w=inv(id=0) R1_w=inv5 R2_w=inv32 R6_w=inv684837 R10=fp0 fp-8=????mmmm fp-16_w=inv8031924080438375200 fp-24_w=inv7310011936944579944 fp-32=mmmm????
;
20: (bf) r4 = r10
21: (07) r4 += -19
; ({ char _fmt[] = "%s\n"; bpf_trace_printk_(_fmt, sizeof(_fmt), i - start, str + start); });
22: (63) *(u32 *)(r10 -28) = r6
23: (bf) r1 = r10
;
24: (07) r1 += -28
; ({ char _fmt[] = "%s\n"; bpf_trace_printk_(_fmt, sizeof(_fmt), i - start, str + start); });
25: (b7) r2 = 4
26: (b7) r3 = 0
27: (85) call bpf_trace_printk#6
last_idx 27 first_idx 0
regs=4 stack=0 before 26: (b7) r3 = 0
regs=4 stack=0 before 25: (b7) r2 = 4
28: (b7) r1 = 6
; if (str[i] == ' ') {
29: (71) r2 = *(u8 *)(r10 -18)
; if (str[i] == ' ') {
30: (55) if r2 != 0x20 goto pc+12
R0=inv(id=0) R1_w=inv6 R2_w=inv32 R6=inv684837 R10=fp0 fp-8=????mmmm fp-16=inv8031924080438375200 fp-24=inv7310011936944579944 fp-32=mmmm????
31: (b7) r2 = 684837
; ({ char _fmt[] = "%s\n"; bpf_trace_printk_(_fmt, sizeof(_fmt), i - start, str + start); });
32: (63) *(u32 *)(r10 -28) = r2
33: (b7) r3 = 6
; ({ char _fmt[] = "%s\n"; bpf_trace_printk_(_fmt, sizeof(_fmt), i - start, str + start); });
34: (1f) r3 -= r1
35: (bf) r4 = r10
;
36: (07) r4 += -24
; ({ char _fmt[] = "%s\n"; bpf_trace_printk_(_fmt, sizeof(_fmt), i - start, str + start); });
37: (4f) r4 |= r1
last_idx 37 first_idx 28
regs=2 stack=0 before 36: (07) r4 += -24
regs=2 stack=0 before 35: (bf) r4 = r10
regs=2 stack=0 before 34: (1f) r3 -= r1
regs=2 stack=0 before 33: (b7) r3 = 6
regs=2 stack=0 before 32: (63) *(u32 *)(r10 -28) = r2
regs=2 stack=0 before 31: (b7) r2 = 684837
regs=2 stack=0 before 30: (55) if r2 != 0x20 goto pc+12
regs=2 stack=0 before 29: (71) r2 = *(u8 *)(r10 -18)
regs=2 stack=0 before 28: (b7) r1 = 6
R4 bitwise operator |= on pointer prohibited
processed 36 insns (limit 1000000) max_states_per_insn 0 total_states 1 peak_states 1 mark_read 1
Traceback (most recent call last):
File "BPFHelloWorld.py", line 31, in <module>
b.attach_kprobe(event=b.get_syscall_fnname("clone"), fn_name="helloworld2")
File "/usr/lib/python3/dist-packages/bcc/__init__.py", line 654, in attach_kprobe
fn = self.load_func(fn_name, BPF.KPROBE)
File "/usr/lib/python3/dist-packages/bcc/__init__.py", line 394, in load_func
raise Exception("Failed to load BPF program %s: %s" %
Exception: Failed to load BPF program b'helloworld2': Permission denied

You are using bcc's bpf_trace_print function which expects different arguments from the kernel helper.
In bcc, bpf_trace_print is a sort of wrapper around the corresponding BPF helper. If you check its documentation, it expects a single mandatory argument, a string and several optional arguments:
Syntax: int bpf_trace_printk(const char *fmt, ...)
So you can write things like:
bpf_trace_printk("remote-port: %d, local-port: %d\\n", skk.remote_port,
skk.local_port);
In contrast, the BPF helper expects the first argument to be the string and the second to be the size of the string:
static const struct bpf_func_proto bpf_trace_printk_proto = {
.func = bpf_trace_printk,
.gpl_only = true,
.ret_type = RET_INTEGER,
.arg1_type = ARG_PTR_TO_MEM,
.arg2_type = ARG_CONST_SIZE,
};

Related

How to store packet offset in BPF Map or skb->cb field?

Example Code
I want to pass the offset to the following tail calls. But when I try to store it in BPF MAP(Method 1) or skb->cb field(Method 2), I get an error offset is outside of the packet.
If Method 1/2 code is removed, the bpf program can be loaded sucessfully.
#include <vmlinux.h>
#include <bpf/bpf_endian.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_tracing.h>
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__type(key, __u32);
__type(value, __u32);
__uint(max_entries, 100);
} state_vars SEC(".maps");
SEC("tc")
int tc_ingress(struct __sk_buff *ctx) {
void *data_end = (void *)(__u64)ctx->data_end;
void *data = (void *)(__u64)ctx->data;
u32 data_len = data_end - data;
u32 rn = 0;
u32 rn_idx = 0;
for (rn = 0; rn < 1000 && rn < data_len && data + rn < data_end; rn++) {
char c = *(char*)(data + rn);
if (c == '\r') {
rn_idx = rn;
break;
}
}
/// Method 1
// u32 var_idx = 0;
// bpf_map_update_elem(&state_vars, &var_idx, &rn_idx, BPF_ANY);
/// Method 2
// ctx->cb[0] = rn_idx;
return TC_ACT_OK;
}
Error Message
Method 1
; void *data = (void *)(__u64)ctx->data;
0: (61) r2 = *(u32 *)(r1 +76)
; void *data_end = (void *)(__u64)ctx->data_end;
1: (61) r3 = *(u32 *)(r1 +80)
2: (b7) r1 = 0
; u32 rn_idx = 0;
3: (63) *(u32 *)(r10 -4) = r1
last_idx 3 first_idx 0
regs=2 stack=0 before 2: (b7) r1 = 0
; u32 data_len = data_end - data;
4: (bf) r4 = r3
5: (1f) r4 -= r2
6: (67) r4 <<= 32
7: (77) r4 >>= 32
; for (rn = 0; rn < 1000 && rn < data_len && data + rn < data_end; rn++) {
8: (15) if r4 == 0x0 goto pc+10
R1_w=invP0 R2_w=pkt(id=0,off=0,r=0,imm=0) R3_w=pkt_end(id=0,off=0,imm=0) R4_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R10=fp0 fp-8=0000????
; for (rn = 0; rn < 1000 && rn < data_len && data + rn < data_end; rn++) {
9: (bf) r5 = r2
10: (0f) r5 += r1
; for (rn = 0; rn < 1000 && rn < data_len && data + rn < data_end; rn++) {
11: (3d) if r5 >= r3 goto pc+7
R1_w=invP0 R2_w=pkt(id=0,off=0,r=0,imm=0) R3_w=pkt_end(id=0,off=0,imm=0) R4_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R5_w=pkt(id=0,off=0,r=0,imm=0) R10=fp0 fp-8=0000????
; char c = *(char*)(data + rn);
12: (71) r5 = *(u8 *)(r5 +0)
invalid access to packet, off=0 size=1, R5(id=0,off=0,r=0)
R5 offset is outside of the packet
processed 13 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Method 2
; void *data = (void *)(__u64)ctx->data;
1: (61) r3 = *(u32 *)(r1 +76)
; void *data_end = (void *)(__u64)ctx->data_end;
2: (61) r4 = *(u32 *)(r1 +80)
; u32 data_len = data_end - data;
3: (bf) r5 = r4
4: (1f) r5 -= r3
5: (bf) r0 = r5
6: (67) r0 <<= 32
7: (77) r0 >>= 32
; for (rn = 0; rn < 1000 && rn < data_len && data + rn < data_end; rn++) {
8: (15) if r0 == 0x0 goto pc+21
R0_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1=ctx(id=0,off=0,imm=0) R2_w=inv0 R3_w=pkt(id=0,off=0,r=0,imm=0) R4_w=pkt_end(id=0,off=0,imm=0) R5_w=inv(id=0) R10=fp0
9: (b7) r2 = 0
10: (b7) r0 = 0
; for (rn = 0; rn < 1000 && rn < data_len && data + rn < data_end; rn++) {
11: (bf) r6 = r3
12: (0f) r6 += r0
last_idx 12 first_idx 0
regs=1 stack=0 before 11: (bf) r6 = r3
regs=1 stack=0 before 10: (b7) r0 = 0
; for (rn = 0; rn < 1000 && rn < data_len && data + rn < data_end; rn++) {
13: (3d) if r6 >= r4 goto pc+16
R0_w=invP0 R1=ctx(id=0,off=0,imm=0) R2_w=inv0 R3_w=pkt(id=0,off=0,r=0,imm=0) R4_w=pkt_end(id=0,off=0,imm=0) R5_w=inv(id=0) R6_w=pkt(id=0,off=0,r=0,imm=0) R10=fp0
; char c = *(char*)(data + rn);
14: (71) r6 = *(u8 *)(r6 +0)
invalid access to packet, off=0 size=1, R6(id=0,off=0,r=0)
R6 offset is outside of the packet
processed 15 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
Question
Why does the error appear when Method 1/2 code is added?
How could I store the offset value in BPF MAP or skb->cb field?
Update 2022.11.1
follow #pchaigno's advice, add +1 in the condition, I get similar error:
; int tc_ingress(struct __sk_buff *ctx)
0: (b7) r2 = 0
; void *data = (void *)(__u64)ctx->data;
1: (61) r3 = *(u32 *)(r1 +76)
; void *data_end = (void *)(__u64)ctx->data_end;
2: (61) r4 = *(u32 *)(r1 +80)
; u32 data_len = data_end - data;
3: (bf) r5 = r4
4: (1f) r5 -= r3
5: (bf) r0 = r5
6: (67) r0 <<= 32
7: (77) r0 >>= 32
; for (rn = 0; rn < 1000 && rn < data_len && data + rn + 1 < data_end; rn++) {
8: (15) if r0 == 0x0 goto pc+23
R0_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R1=ctx(id=0,off=0,imm=0) R2_w=inv0 R3_w=pkt(id=0,off=0,r=0,imm=0) R4_w=pkt_end(id=0,off=0,imm=0) R5_w=inv(id=0) R10=fp0
9: (b7) r2 = 0
10: (b7) r0 = 0
; for (rn = 0; rn < 1000 && rn < data_len && data + rn + 1 < data_end; rn++) {
11: (bf) r6 = r3
12: (0f) r6 += r0
last_idx 12 first_idx 0
regs=1 stack=0 before 11: (bf) r6 = r3
regs=1 stack=0 before 10: (b7) r0 = 0
13: (bf) r7 = r6
14: (07) r7 += 1
; for (rn = 0; rn < 1000 && rn < data_len && data + rn + 1 < data_end; rn++) {
15: (3d) if r7 >= r4 goto pc+16
R0_w=invP0 R1=ctx(id=0,off=0,imm=0) R2_w=inv0 R3_w=pkt(id=0,off=0,r=0,imm=0) R4_w=pkt_end(id=0,off=0,imm=0) R5_w=inv(id=0) R6_w=pkt(id=0,off=0,r=0,imm=0) R7_w=pkt(id=0,off=1,r=0,imm=0) R10=fp0
; char c = *(char*)(data + rn);
16: (71) r6 = *(u8 *)(r6 +0)
invalid access to packet, off=0 size=1, R6(id=0,off=0,r=0)
R6 offset is outside of the packet
processed 17 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
TL;DR. The issue is when you read the packet data, not when you write it. You have an off-by-one issue on the bounds check. The issue only appears once you use the read data because otherwise the compiler optimizes out the code.
Verifier Error Explanation
; for (rn = 0; rn < 1000 && rn < data_len && data + rn < data_end; rn++) {
11: (3d) if r5 >= r3 goto pc+7
R1_w=invP0 R2_w=pkt(id=0,off=0,r=0,imm=0) R3_w=pkt_end(id=0,off=0,imm=0) R4_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R5_w=pkt(id=0,off=0,r=0,imm=0) R10=fp0 fp-8=0000????
; char c = *(char*)(data + rn);
12: (71) r5 = *(u8 *)(r5 +0)
invalid access to packet, off=0 size=1, R5(id=0,off=0,r=0)
The verifier says that that the packet access is out of bounds by one: R5's offset + size > R3's offset. That is, both offsets are 0 and the access size is 1.
Root Cause
Your bounds check is off by one:
for (rn = 0; rn < 1000 && rn < data_len && data + rn < data_end; rn++) {
char c = *(char*)(data + rn);
To account for the access size, it should be:
for (rn = 0; rn < 1000 && rn < data_len && data + rn + 1 < data_end; rn++) {
char c = *(char*)(data + rn);
Why does it only happen once you write the packet data?
If you don't write rn_idx anywhere, then the compiler understands that rn_idx and c are not needed. All code for those variables is compiled out and the out-of-bound packet access is removed.

Zabbix server not starting after upgrade from 4.0 to 6.0

Zabbix server not starting after upgrading from 4.0 to 6.0
OS Oracle Linux 8. Database postgresql 14, web is nginx. cachesize is 2048M.
zabbix upgrade to 6.0
1142698:20220327:151229.553 ====== Fatal information: ======
1142698:20220327:151229.553 Program counter: 0x5626e015fd3a
1142698:20220327:151229.553 === Registers: ===
1142698:20220327:151229.554 r8 = 5626e247bb70 = 94725005097840 = 94725005097840
1142698:20220327:151229.554 r9 = 5626e246f5c0 = 94725005047232 = 94725005047232
1142698:20220327:151229.554 r10 = 0 = 0 = 0
1142698:20220327:151229.554 r11 = f = 15 = 15
1142698:20220327:151229.554 r12 = 5626e2430800 = 94725004789760 = 94725004789760
1142698:20220327:151229.554 r13 = 5626e027998b = 94724969437579 = 94724969437579
1142698:20220327:151229.554 r14 = 0 = 0 = 0
1142698:20220327:151229.554 r15 = 0 = 0 = 0
1142698:20220327:151229.554 rdi = 0 = 0 = 0
1142698:20220327:151229.554 rsi = 7ffc909bdb64 = 140722734619492 = 140722734619492
1142698:20220327:151229.554 rbp = 7ffc909bdb10 = 140722734619408 = 140722734619408
1142698:20220327:151229.554 rbx = 1 = 1 = 1
1142698:20220327:151229.554 rdx = 7ffc909bdb64 = 140722734619492 = 140722734619492
1142698:20220327:151229.554 rax = 0 = 0 = 0
1142698:20220327:151229.554 rcx = 0 = 0 = 0
1142698:20220327:151229.554 rsp = 7ffc909bdb10 = 140722734619408 = 140722734619408
1142698:20220327:151229.554 rip = 5626e015fd3a = 94724968283450 = 94724968283450
1142698:20220327:151229.554 efl = 10202 = 66050 = 66050
1142698:20220327:151229.554 csgsfs = 2b000000000033 = 12103423998558259 = 12103423998558259
1142698:20220327:151229.554 err = 4 = 4 = 4
1142698:20220327:151229.554 trapno = e = 14 = 14
1142698:20220327:151229.554 oldmask = 0 = 0 = 0
1142698:20220327:151229.554 cr2 = 0 = 0 = 0
1142698:20220327:151229.554 === Backtrace: ===
1142698:20220327:151229.555 15: /usr/sbin/zabbix_server: timer #1 [started, processing maintenances](zbx_backtrace+0x3f) [0x5626e0150815]
1142698:20220327:151229.555 14: /usr/sbin/zabbix_server: timer #1 [started, processing maintenances](zbx_log_fatal_info+0x141) [0x5626e0150a72]
1142698:20220327:151229.555 13: /usr/sbin/zabbix_server: timer #1 [started, processing maintenances](+0x24c25e) [0x5626e015125e]
1142698:20220327:151229.555 12: /lib64/libpthread.so.0(+0x12c30) [0x7f0660ebdc30]
1142698:20220327:151229.555 11: /usr/sbin/zabbix_server: timer #1 [started, processing maintenances](zbx_deserialize_uint31_compact+0x10) [0x5626e015fd3a]
1142698:20220327:151229.555 10: /usr/sbin/zabbix_server: timer #1 [started, processing maintenances](zbx_get_serialized_expression_functionids+0x27) [0x5626e01418fe]
1142698:20220327:151229.555 9: /usr/sbin/zabbix_server: timer #1 [started, processing maintenances](zbx_dc_get_event_maintenances+0x240) [0x5626e00ff7a3]
1142698:20220327:151229.555 8: /usr/sbin/zabbix_server: timer #1 [started, processing maintenances](+0x9fb21) [0x5626dffa4b21]
1142698:20220327:151229.555 7: /usr/sbin/zabbix_server: timer #1 [started, processing maintenances](timer_thread+0x1f2) [0x5626dffa53ad]
1142698:20220327:151229.555 6: /usr/sbin/zabbix_server: timer #1 [started, processing maintenances](zbx_thread_start+0x37) [0x5626e0160849]
1142698:20220327:151229.555 5: /usr/sbin/zabbix_server: timer #1 [started, processing maintenances](+0x66b33) [0x5626dff6bb33]
1142698:20220327:151229.555 4: /usr/sbin/zabbix_server: timer #1 [started, processing maintenances](MAIN_ZABBIX_ENTRY+0x7b8) [0x5626dff6ca5e]
1142698:20220327:151229.555 3: /usr/sbin/zabbix_server: timer #1 [started, processing maintenances](daemon_start+0x2f5) [0x5626e0150434]
1142698:20220327:151229.555 2: /usr/sbin/zabbix_server: timer #1 [started, processing maintenances](main+0x33a) [0x5626dff6afee]
1142698:20220327:151229.557 Please consider attaching a disassembly listing to your bug report.
1142698:20220327:151229.557 This listing can be produced with, e.g., objdump -DSswx zabbix_server.
1142698:20220327:151229.557 ================================
1142677:20220327:151229.559 One child process died (PID:1142698,exitcode/signal:1). Exiting ...
1142677:20220327:151229.657 PROCESS EXIT: 1142698
1142678:20220327:151229.657 HA manager has been paused
zabbix_server [1142677]: Error waiting for process with PID 1142698: [10] No child processes
1142678:20220327:151229.705 HA manager has been stopped
1142677:20220327:151229.745 syncing history data...
1142677:20220327:151229.746 error reason for "404b30b2-eaf6-11e1-8c79-b4c22c3547a9:vmware.hv.cpu.usage.perc" changed: Cannot evaluate function: item "/404b30b2-eaf6-11e1-8c79-b4c22c3547a9/vmware.hv.cpu.usage[{$URL},{HOST.HOST}]" is not supported at "last(//vmware.hv.cpu.usage[{$URL},{HOST.HOST}])*100)/(last(//vmware.hv.hw.cpu.num[{$URL},{HOST.HOST}])*last(//vmware.hv.hw.cpu.freq[{$URL},{HOST.HOST}]))".
1142677:20220327:151229.746 error reason for "38383135-3837-5a43-3331-353033444d35:vmware.hv.network.in[{$URL},{HOST.HOST},bps]" changed: No "vmware collector" processes started.
1142677:20220327:151229.754 [Z3005] query failed: [0] PGRES_FATAL_ERROR:ERROR: invalid input syntax for type integer: "1648328400.000000"
LINE 1: ... partitions.history_p2022_03_27 (CHECK ((clock >= '164832840...
^
QUERY: CREATE TABLE IF NOT EXISTS partitions.history_p2022_03_27 (CHECK ((clock >= '1648328400.000000' AND clock < '1648414800.000000'))) INHERITS (history) TABLESPACE history_partitions;
CONTEXT: PL/pgSQL function trg_partition() line 38 at EXECUTE
[insert into history (itemid,clock,ns,value) values (872723,1648383149,527845142,0);
]
1142677:20220327:151229.809 syncing history data... 100.000000%
1142677:20220327:151229.809 syncing history data done
1142677:20220327:151229.809 syncing trend data...
1142677:20220327:151229.829 syncing trend data done
1142677:20220327:151229.867 Zabbix Server stopped. Zabbix 6.0.2 (revision d726a4d916).

eBPF: 'bpf_map_update()' returns the 'invalid indirect read from stack' error

I have an eBPF program with the following map definitions:
struct bpf_map_def SEC("maps") servers = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(struct ip_key),
.value_size = sizeof(struct dest_info),
.max_entries = MAX_SERVERS,
};
struct bpf_map_def SEC("maps") client_addrs = {
.type = BPF_MAP_TYPE_HASH,
.key_size = sizeof(struct port_key),
.value_size = sizeof(struct client_port_addr),
.max_entries = MAX_CLIENTS,
};
where the struct definitions are as below:
struct port_key {
__u16 port;
__u16 pad[3];
};
struct ip_key {
__u32 key;
__u32 pad;
};
struct dest_info {
__u32 saddr;
__u32 daddr;
__u64 bytes;
__u64 pkts;
__u8 dmac[6];
__u16 pad;
};
struct client_port_addr {
__u32 client_ip;
__u8 dmac[6];
__u16 pad[3];
};
The program itself, after the pointer verifications and initial checks, is shown below.
struct port_key key = {0};
struct client_port_addr val;
key.port = udp->source;
val.client_ip = iph->saddr;
memcpy (val.dmac, eth->h_source, 6 * sizeof(__u8));
bpf_map_update_elem(&client_addrs, &key, &val, BPF_ANY);
iph->saddr = IP_ADDRESS(BALANCER);
iph->daddr = dest_tnl->daddr;
memcpy (eth->h_source, eth->h_dest, 6 * sizeof(__u8));
memcpy (eth->h_dest, dest_tnl->dmac, 6 * sizeof(__u8));
So, the problem is that I use bpf_map_update() in my code, but while using it, I get the invalid indirect read from the stack error as shown below.
libbpf:
0: (bf) r6 = r1
1: (61) r9 = *(u32 *)(r6 +4)
2: (61) r7 = *(u32 *)(r6 +0)
3: (18) r1 = 0xffffa59ac00b6000
5: (b7) r2 = 24
6: (85) call bpf_trace_printk#6
R1_w=map_value(id=0,off=0,ks=4,vs=50,imm=0) R2_w=inv24 R6_w=ctx(id=0,off=0,imm=0) R7_w=pkt(id=0,off=0,r=0,imm=0) R9_w=pkt_end(id=0,off=0,imm=0) R10=fp0
last_idx 6 first_idx 0
regs=4 stack=0 before 5: (b7) r2 = 24
7: (b7) r8 = 1
8: (bf) r1 = r7
9: (07) r1 += 14
10: (2d) if r1 > r9 goto pc+130
R0_w=inv(id=0) R1_w=pkt(id=0,off=14,r=14,imm=0) R6_w=ctx(id=0,off=0,imm=0) R7_w=pkt(id=0,off=0,r=14,imm=0) R8_w=inv1 R9_w=pkt_end(id=0,off=0,imm=0) R10=fp0
11: (71) r1 = *(u8 *)(r7 +12)
12: (71) r2 = *(u8 *)(r7 +13)
13: (67) r2 <<= 8
14: (4f) r2 |= r1
15: (b7) r8 = 2
16: (55) if r2 != 0x8 goto pc+124
R0=inv(id=0) R1=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R2=inv8 R6=ctx(id=0,off=0,imm=0) R7=pkt(id=0,off=0,r=14,imm=0) R8=inv2 R9=pkt_end(id=0,off=0,imm=0) R10=fp0
17: (61) r7 = *(u32 *)(r6 +4)
18: (61) r9 = *(u32 *)(r6 +0)
19: (bf) r6 = r9
20: (07) r6 += 14
21: (b7) r8 = 1
22: (2d) if r6 > r7 goto pc+118
R0=inv(id=0) R1=inv(id=0,umax_value=255,var_off=(0x0; 0xff)) R2=inv8 R6_w=pkt(id=0,off=14,r=14,imm=0) R7_w=pkt_end(id=0,off=0,imm=0) R8_w=inv1 R9_w=pkt(id=0,off=0,r=14,imm=0) R10=fp0
23: (bf) r1 = r9
24: (07) r1 += 34
25: (b7) r8 = 1
26: (2d) if r1 > r7 goto pc+114
R0=inv(id=0) R1=pkt(id=0,off=34,r=34,imm=0) R2=inv8 R6=pkt(id=0,off=14,r=34,imm=0) R7=pkt_end(id=0,off=0,imm=0) R8=inv1 R9=pkt(id=0,off=0,r=34,imm=0) R10=fp0
27: (71) r1 = *(u8 *)(r6 +0)
28: (57) r1 &= 15
29: (b7) r8 = 1
30: (55) if r1 != 0x5 goto pc+110
R0=inv(id=0) R1_w=inv5 R2=inv8 R6=pkt(id=0,off=14,r=34,imm=0) R7=pkt_end(id=0,off=0,imm=0) R8_w=inv1 R9=pkt(id=0,off=0,r=34,imm=0) R10=fp0
31: (61) r3 = *(u32 *)(r9 +26)
32: (18) r1 = 0xffffa59ac00b6018
34: (b7) r2 = 26
35: (85) call bpf_trace_printk#6
R0=inv(id=0) R1_w=map_value(id=0,off=24,ks=4,vs=50,imm=0) R2_w=inv26 R3_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff)) R6=pkt(id=0,off=14,r=34,imm=0) R7=pkt_end(id=0,off=0,imm=0) R8_w=inv1 R9=pkt(id=0,off=0,r=34,imm=0) R10=fp0
last_idx 35 first_idx 26
regs=4 stack=0 before 34: (b7) r2 = 26
36: (69) r1 = *(u16 *)(r9 +20)
37: (57) r1 &= 65343
38: (b7) r8 = 1
39: (55) if r1 != 0x0 goto pc+101
R0=inv(id=0) R1_w=inv0 R6=pkt(id=0,off=14,r=34,imm=0) R7=pkt_end(id=0,off=0,imm=0) R8_w=inv1 R9=pkt(id=0,off=0,r=34,imm=0) R10=fp0
40: (71) r1 = *(u8 *)(r9 +23)
41: (b7) r8 = 2
42: (55) if r1 != 0x11 goto pc+98
R0=inv(id=0) R1_w=inv17 R6=pkt(id=0,off=14,r=34,imm=0) R7=pkt_end(id=0,off=0,imm=0) R8_w=inv2 R9=pkt(id=0,off=0,r=34,imm=0) R10=fp0
43: (bf) r1 = r9
44: (07) r1 += 42
45: (b7) r8 = 1
46: (2d) if r1 > r7 goto pc+94
R0=inv(id=0) R1=pkt(id=0,off=42,r=42,imm=0) R6=pkt(id=0,off=14,r=42,imm=0) R7=pkt_end(id=0,off=0,imm=0) R8=inv1 R9=pkt(id=0,off=0,r=42,imm=0) R10=fp0
47: (b7) r8 = 0
48: (7b) *(u64 *)(r10 -8) = r8
last_idx 48 first_idx 46
regs=100 stack=0 before 47: (b7) r8 = 0
49: (bf) r2 = r10
50: (07) r2 += -8
51: (18) r1 = 0xffff9a7bed1bc000
53: (85) call bpf_map_lookup_elem#1
54: (bf) r7 = r0
55: (15) if r7 == 0x0 goto pc+85
R0=map_value(id=0,off=0,ks=8,vs=32,imm=0) R6=pkt(id=0,off=14,r=42,imm=0) R7=map_value(id=0,off=0,ks=8,vs=32,imm=0) R8=invP0 R9=pkt(id=0,off=0,r=42,imm=0) R10=fp0 fp-8=mmmmmmmm
56: (b7) r8 = 0
57: (7b) *(u64 *)(r10 -16) = r8
last_idx 57 first_idx 55
regs=100 stack=0 before 56: (b7) r8 = 0
58: (69) r1 = *(u16 *)(r9 +34)
59: (6b) *(u16 *)(r10 -16) = r1
60: (61) r1 = *(u32 *)(r9 +26)
61: (63) *(u32 *)(r10 -32) = r1
62: (71) r1 = *(u8 *)(r9 +11)
63: (73) *(u8 *)(r10 -23) = r1
64: (71) r1 = *(u8 *)(r9 +10)
65: (73) *(u8 *)(r10 -24) = r1
66: (71) r1 = *(u8 *)(r9 +7)
67: (67) r1 <<= 8
68: (71) r2 = *(u8 *)(r9 +6)
69: (4f) r1 |= r2
70: (71) r2 = *(u8 *)(r9 +9)
71: (67) r2 <<= 8
72: (71) r3 = *(u8 *)(r9 +8)
73: (4f) r2 |= r3
74: (67) r2 <<= 16
75: (4f) r2 |= r1
76: (63) *(u32 *)(r10 -28) = r2
77: (bf) r2 = r10
78: (07) r2 += -16
79: (bf) r3 = r10
80: (07) r3 += -32
81: (18) r1 = 0xffff9a7bed1bf400
83: (b7) r4 = 0
84: (85) call bpf_map_update_elem#2
invalid indirect read from stack R3 off -32+10 size 16
processed 81 insns (limit 1000000) max_states_per_insn 0 total_states 5 peak_states 5 mark_read 2
libbpf: -- END LOG --
libbpf: failed to load program 'loadbal'
All of the defined structs for keys and values are padded to their next multiple of 8 bytes. Since I could not find any useful and descriptive explanation on my issue, explanations of this topic and maybe even a bit of detail are much appreciated.
Please let me know if you need more information.
The verifier complains because your code is trying to read uninitialised data from the stack, in particular in your variable val.
If we look at your code:
struct client_port_addr {
__u32 client_ip;
__u8 dmac[6];
__u16 pad[3];
};
struct client_port_addr val;
[...]
val.client_ip = iph->saddr; // val.client_ip
memcpy (val.dmac, eth->h_source, 6 * sizeof(__u8)); // val.dmac
// val.pad where??
bpf_map_update_elem(&client_addrs, &key, &val, BPF_ANY);
You initialised val.client_ip, and val.dmac, but val.pad is never initialised. When you pass val to bpf_map_update_elem(), the eBPF verifier realises that the helper function might read this variable which contains uninitialised memory from kernel space. This is a security risk, therefore, the verifier rejects the program.
To fix the issue, make sure you initialise the memory before using it. You have at least three ways to do so:
You could initialise val when declaring it, like for your key:
struct client_port_addr val = {0};
This should work in your case, but is not generally recommended, because this will set all fields to 0 but if your struct contains padding that was not explicitely added, it may remain uninitialised.
In your case, you could fill val.pad with zeroes with memcpy(). Same as the first option, this won't help if the compiler pads your struct.
The safest option would be to memset() the struct after declaring it:
struct client_port_addr val;
memset(&val, 0, sizeof(val));
Then you can fill the relevant fields of the struct, and pass it to the map update helper.

Solving linear equation systems in Coq [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I need to prove that this system of equations does not have a solution (the reason is that it is over-determined). Is there an easy way to do it in Coq? I.e. a tactic or library?
Require Import Reals.
Open Scope R.
Lemma no_solution:
forall
b11 b12 b13 b14 b21 b22 b23 b24 b31 b32 b33 b34
r r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 : R,
1 = r * b11 + r0 * b21 + r1 * b31 ->
0 = r * b12 + r0 * b22 + r1 * b32 ->
0 = r * b13 + r0 * b23 + r1 * b33 ->
0 = r * b14 + r0 * b24 + r1 * b34 ->
0 = r2 * b11 + r3 * b21 + r4 * b31 ->
1 = r2 * b12 + r3 * b22 + r4 * b32 ->
0 = r2 * b13 + r3 * b23 + r4 * b33 ->
0 = r2 * b14 + r3 * b24 + r4 * b34 ->
0 = r5 * b11 + r6 * b21 + r7 * b31 ->
0 = r5 * b12 + r6 * b22 + r7 * b32 ->
1 = r5 * b13 + r6 * b23 + r7 * b33 ->
0 = r5 * b14 + r6 * b24 + r7 * b34 ->
0 = r8 * b11 + r9 * b21 + r10 * b31 ->
0 = r8 * b12 + r9 * b22 + r10 * b32 ->
0 = r8 * b13 + r9 * b23 + r10 * b33 ->
1 = r8 * b14 + r9 * b24 + r10 * b34 ->
False.
If I understand well, this bunch of equations cannot be simultaneously true because it would require the rank of a 3x4 matrix to be higher than 3.
The main theorem for your result is called mulmx_max_rank in the mathematical components library. I had more work connecting your unstructured presentation of the problem to a
structured one using matrices than finding the right theorem. This experiment was made in coq-8.7, with coq-mathcomp-ssreflect and coq-mathcomp-algebra loaded through opam (version 1.6.2 of the packages).
Note that this results holds for any field structure.
From mathcomp Require Import all_ssreflect all_algebra.
Set Implicit Arguments.
Unset Strict Implicit.
Unset Printing Implicit Defensive.
Import GRing.Theory Num.Theory.
Open Scope ring_scope.
Section Solving_linear_equation_systems_in_Coq.
Variable R : fieldType.
Definition seq2matrix m n (s : seq (seq R)) :=
\matrix_(i < m, j < n)
nth 0 (nth nil s i) j.
Lemma no_solution:
forall
b11 b12 b13 b14 b21 b22 b23 b24 b31 b32 b33 b34
r r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 : R,
1 = r * b11 + r0 * b21 + r1 * b31 ->
0 = r * b12 + r0 * b22 + r1 * b32 ->
0 = r * b13 + r0 * b23 + r1 * b33 ->
0 = r * b14 + r0 * b24 + r1 * b34 ->
0 = r2 * b11 + r3 * b21 + r4 * b31 ->
1 = r2 * b12 + r3 * b22 + r4 * b32 ->
0 = r2 * b13 + r3 * b23 + r4 * b33 ->
0 = r2 * b14 + r3 * b24 + r4 * b34 ->
0 = r5 * b11 + r6 * b21 + r7 * b31 ->
0 = r5 * b12 + r6 * b22 + r7 * b32 ->
1 = r5 * b13 + r6 * b23 + r7 * b33 ->
0 = r5 * b14 + r6 * b24 + r7 * b34 ->
0 = r8 * b11 + r9 * b21 + r10 * b31 ->
0 = r8 * b12 + r9 * b22 + r10 * b32 ->
0 = r8 * b13 + r9 * b23 + r10 * b33 ->
1 = r8 * b14 + r9 * b24 + r10 * b34 ->
False.
Proof.
move => b11 b12 b13 b14 b21 b22 b23 b24 b31 b32 b33 b34
r r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 eq1 eq2 eq3 eq4
eq5 eq6 eq7 eq8 eq9 eq10 eq11 eq12 eq13 eq14 eq15 eq16.
set Inp := seq2matrix 4 3
[:: [:: r; r0; r1];
[:: r2; r3; r4];
[:: r5; r6; r7];
[:: r8; r9; r10]].
set B := seq2matrix 3 4 [:: [:: b11; b12; b13; b14];
[:: b21; b22; b23; b24];
[:: b31; b32; b33; b34]].
suff abs: Inp *m B = 1%:M.
have : (\rank (Inp *m B) <= 3)%N by apply: mulmx_max_rank.
by rewrite abs mxrank1.
by apply/matrixP=> [[ [ | [ | [ | [ | ?]]]] pi]]
[ [ | [ | [ | [ | ?]]]] pj] //;
rewrite /Inp /seq2matrix /= !(mxE, big_ord_recr, big_ord0) //= add0r /=
-?(eq1, eq2, eq3, eq4, eq5, eq6, eq7, eq8, eq9, eq10, eq11, eq12).
Qed.
End Solving_linear_equation_systems_in_Coq.

scala multiple assignment efficiency

Is multiple assignment (e.g. val (x, y) = (1, 2)) less efficient at runtime than the corresponding single assignments (val x = 1; val y = 2)?
I can imagine the answer being "yes" because scala might need to construct intermediate tuples. Is this correct?
What if I had an extra tuple laying around, e.g. val tup = (1, 2)
Now is it more efficient to do:
(a) val (x, y) = tup
OR
(b) val x = tup._1; val y = tup._2
or are they the same?
The difference from the previous example is that the RHS no longer needs to be allocated.
You can use the new :javap feature of the scala 2.9 REPL:
scala> class A { val (a, b) = (1, 2) }
scala> :javap -c A
Compiled from "<console>"
public class A extends java.lang.Object implements scala.ScalaObject{
...
public A();
Code:
0: aload_0
1: invokespecial #22; //Method java/lang/Object."<init>":()V
4: aload_0
5: new #24; //class scala/Tuple2$mcII$sp
8: dup
9: iconst_1
10: iconst_2
11: invokespecial #27; //Method scala/Tuple2$mcII$sp."<init>":(II)V
14: astore_1
15: aload_1
16: ifnull 68
19: aload_1
20: astore_2
21: new #24; //class scala/Tuple2$mcII$sp
24: dup
25: aload_2
26: invokevirtual #33; //Method scala/Tuple2._1:()Ljava/lang/Object;
29: invokestatic #39; //Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
32: aload_2
33: invokevirtual #42; //Method scala/Tuple2._2:()Ljava/lang/Object;
36: invokestatic #39; //Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I
39: invokespecial #27; //Method scala/Tuple2$mcII$sp."<init>":(II)V
42: putfield #44; //Field x$1:Lscala/Tuple2;
45: aload_0
46: aload_0
47: getfield #44; //Field x$1:Lscala/Tuple2;
50: invokevirtual #47; //Method scala/Tuple2._1$mcI$sp:()I
53: putfield #14; //Field a:I
56: aload_0
57: aload_0
58: getfield #44; //Field x$1:Lscala/Tuple2;
61: invokevirtual #50; //Method scala/Tuple2._2$mcI$sp:()I
64: putfield #16; //Field b:I
67: return
68: new #52; //class scala/MatchError
71: dup
72: aload_1
73: invokespecial #55; //Method scala/MatchError."<init>":(Ljava/lang/Object;)V
76: athrow
}
scala> class B { val a = 1; val b = 2 }
scala> :javap -c B
Compiled from "<console>"
public class B extends java.lang.Object implements scala.ScalaObject{
...
public B();
Code:
0: aload_0
1: invokespecial #20; //Method java/lang/Object."<init>":()V
4: aload_0
5: iconst_1
6: putfield #12; //Field a:I
9: aload_0
10: iconst_2
11: putfield #14; //Field b:I
14: return
}
so i guess the answer is the tuple version is slower. i wonder why there is boxing going on, shouldn't that be gone with the specialization of tuples?!
I wasn't really satified with the lack of benchmarks, so here are some benchmarks done using https://github.com/sirthias/scala-benchmarking-template, which uses Google Caliper in the background. Charts are contain the calculated (ns/inside loop execution), but the text results are directly from the console. The code:
package org.example
import annotation.tailrec
import com.google.caliper.Param
class Benchmark extends SimpleScalaBenchmark {
#Param(Array("10", "100", "1000", "10000"))
val length: Int = 0
var array: Array[Int] = _
override def setUp() {
array = new Array(length)
}
def timeRegular(reps: Int) = repeat(reps) {
var result = 0
array.foreach {value => {
val tuple = (value, value)
val (out1, out2) = tuple
result += out1
result += out2
}}
result
}
def timeUnpack(reps: Int) = repeat(reps) {
var result = 0
array.foreach {value =>{
val tuple = (value, value)
val out1 = tuple._1
val out2 = tuple._2
result += out1
result += out2
}}
result
}
def timeBoxedUnpack(reps: Int) = repeat(reps) {
var result = 0
array.foreach {value =>{
val tuple = (value, value, value)
val out1 = tuple._1
val out2 = tuple._2
val out3 = tuple._3
result += out1
result += out2
result += out3
}}
result
}
}
Scala 2.9.2
0% Scenario{vm=java, trial=0, benchmark=Regular, length=10} 102.09 ns; σ=1.04 ns # 10 trials
8% Scenario{vm=java, trial=0, benchmark=Unpack, length=10} 28.23 ns; σ=0.27 ns # 6 trials
17% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=10} 110.17 ns; σ=1.95 ns # 10 trials
25% Scenario{vm=java, trial=0, benchmark=Regular, length=100} 909.73 ns; σ=6.42 ns # 3 trials
33% Scenario{vm=java, trial=0, benchmark=Unpack, length=100} 271.40 ns; σ=1.35 ns # 3 trials
42% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=100} 946.59 ns; σ=8.38 ns # 3 trials
50% Scenario{vm=java, trial=0, benchmark=Regular, length=1000} 8966.33 ns; σ=40.17 ns # 3 trials
58% Scenario{vm=java, trial=0, benchmark=Unpack, length=1000} 2517.54 ns; σ=4.56 ns # 3 trials
67% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=1000} 9374.71 ns; σ=68.25 ns # 3 trials
75% Scenario{vm=java, trial=0, benchmark=Regular, length=10000} 81244.84 ns; σ=661.81 ns # 3 trials
83% Scenario{vm=java, trial=0, benchmark=Unpack, length=10000} 23502.73 ns; σ=122.83 ns # 3 trials
92% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=10000} 112683.27 ns; σ=1101.51 ns # 4 trials
length benchmark ns linear runtime
10 Regular 102.1 =
10 Unpack 28.2 =
10 BoxedUnpack 110.2 =
100 Regular 909.7 =
100 Unpack 271.4 =
100 BoxedUnpack 946.6 =
1000 Regular 8966.3 ==
1000 Unpack 2517.5 =
1000 BoxedUnpack 9374.7 ==
10000 Regular 81244.8 =====================
10000 Unpack 23502.7 ======
10000 BoxedUnpack 112683.3 ==============================
Scala 2.10.3
0% Scenario{vm=java, trial=0, benchmark=Regular, length=10} 28.26 ns; σ=0.13 ns # 3 trials
8% Scenario{vm=java, trial=0, benchmark=Unpack, length=10} 28.27 ns; σ=0.07 ns # 3 trials
17% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=10} 109.56 ns; σ=2.27 ns # 10 trials
25% Scenario{vm=java, trial=0, benchmark=Regular, length=100} 273.40 ns; σ=2.73 ns # 5 trials
33% Scenario{vm=java, trial=0, benchmark=Unpack, length=100} 271.25 ns; σ=2.63 ns # 6 trials
42% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=100} 1088.00 ns; σ=10.60 ns # 3 trials
50% Scenario{vm=java, trial=0, benchmark=Regular, length=1000} 2516.30 ns; σ=7.13 ns # 3 trials
58% Scenario{vm=java, trial=0, benchmark=Unpack, length=1000} 2525.00 ns; σ=24.25 ns # 6 trials
67% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=1000} 10188.98 ns; σ=101.32 ns # 3 trials
75% Scenario{vm=java, trial=0, benchmark=Regular, length=10000} 25886.80 ns; σ=116.33 ns # 3 trials
83% Scenario{vm=java, trial=0, benchmark=Unpack, length=10000} 25938.97 ns; σ=76.02 ns # 3 trials
92% Scenario{vm=java, trial=0, benchmark=BoxedUnpack, length=10000} 115629.82 ns; σ=1159.41 ns # 5 trials
length benchmark ns linear runtime
10 Regular 28.3 =
10 Unpack 28.3 =
10 BoxedUnpack 109.6 =
100 Regular 273.4 =
100 Unpack 271.2 =
100 BoxedUnpack 1088.0 =
1000 Regular 2516.3 =
1000 Unpack 2525.0 =
1000 BoxedUnpack 10189.0 ==
10000 Regular 25886.8 ======
10000 Unpack 25939.0 ======
10000 BoxedUnpack 115629.8 ==============================
Conclusion
Unpacking tuples is fast as long as the tuple arity <= 2. If it is greater than 2, there is too much indirection and for the Hotspot compiler to optimise.
Scala 2.9.2 has some sort of weird issues making assignment with tuples faster than regular assignment. Weird, but it can probably be disregarded.
This is done using
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
Just execute all the options a million times and measure how long it took by calling System.currentTimeMillis. In theory, multiple assignment should be less efficient, but it might be optimized away.