diff --git a/Documentation/networking/filter.txt b/Documentation/networking/filter.txt index cdb3e40b9d14..a06b48d2f5cc 100644 --- a/Documentation/networking/filter.txt +++ b/Documentation/networking/filter.txt @@ -1,49 +1,563 @@ -filter.txt: Linux Socket Filtering -Written by: Jay Schulist +Linux Socket Filtering aka Berkeley Packet Filter (BPF) +======================================================= Introduction -============ +------------ - Linux Socket Filtering is derived from the Berkeley -Packet Filter. There are some distinct differences between -the BSD and Linux Kernel Filtering. +Linux Socket Filtering (LSF) is derived from the Berkeley Packet Filter. +Though there are some distinct differences between the BSD and Linux +Kernel filtering, but when we speak of BPF or LSF in Linux context, we +mean the very same mechanism of filtering in the Linux kernel. -Linux Socket Filtering (LSF) allows a user-space program to -attach a filter onto any socket and allow or disallow certain -types of data to come through the socket. LSF follows exactly -the same filter code structure as the BSD Berkeley Packet Filter -(BPF), so referring to the BSD bpf.4 manpage is very helpful in -creating filters. +BPF allows a user-space program to attach a filter onto any socket and +allow or disallow certain types of data to come through the socket. LSF +follows exactly the same filter code structure as BSD's BPF, so referring +to the BSD bpf.4 manpage is very helpful in creating filters. -LSF is much simpler than BPF. One does not have to worry about -devices or anything like that. You simply create your filter -code, send it to the kernel via the SO_ATTACH_FILTER option and -if your filter code passes the kernel check on it, you then -immediately begin filtering data on that socket. +On Linux, BPF is much simpler than on BSD. One does not have to worry +about devices or anything like that. You simply create your filter code, +send it to the kernel via the SO_ATTACH_FILTER option and if your filter +code passes the kernel check on it, you then immediately begin filtering +data on that socket. -You can also detach filters from your socket via the -SO_DETACH_FILTER option. This will probably not be used much -since when you close a socket that has a filter on it the -filter is automagically removed. The other less common case -may be adding a different filter on the same socket where you had another -filter that is still running: the kernel takes care of removing -the old one and placing your new one in its place, assuming your -filter has passed the checks, otherwise if it fails the old filter -will remain on that socket. +You can also detach filters from your socket via the SO_DETACH_FILTER +option. This will probably not be used much since when you close a socket +that has a filter on it the filter is automagically removed. The other +less common case may be adding a different filter on the same socket where +you had another filter that is still running: the kernel takes care of +removing the old one and placing your new one in its place, assuming your +filter has passed the checks, otherwise if it fails the old filter will +remain on that socket. -SO_LOCK_FILTER option allows to lock the filter attached to a -socket. Once set, a filter cannot be removed or changed. This allows -one process to setup a socket, attach a filter, lock it then drop -privileges and be assured that the filter will be kept until the -socket is closed. +SO_LOCK_FILTER option allows to lock the filter attached to a socket. Once +set, a filter cannot be removed or changed. This allows one process to +setup a socket, attach a filter, lock it then drop privileges and be +assured that the filter will be kept until the socket is closed. -Examples -======== +The biggest user of this construct might be libpcap. Issuing a high-level +filter command like `tcpdump -i em1 port 22` passes through the libpcap +internal compiler that generates a structure that can eventually be loaded +via SO_ATTACH_FILTER to the kernel. `tcpdump -i em1 port 22 -ddd` +displays what is being placed into this structure. -Ioctls- -setsockopt(sockfd, SOL_SOCKET, SO_ATTACH_FILTER, &Filter, sizeof(Filter)); -setsockopt(sockfd, SOL_SOCKET, SO_DETACH_FILTER, &value, sizeof(value)); -setsockopt(sockfd, SOL_SOCKET, SO_LOCK_FILTER, &value, sizeof(value)); +Although we were only speaking about sockets here, BPF in Linux is used +in many more places. There's xt_bpf for netfilter, cls_bpf in the kernel +qdisc layer, SECCOMP-BPF (SECure COMPuting [1]), and lots of other places +such as team driver, PTP code, etc where BPF is being used. -See the BSD bpf.4 manpage and the BSD Packet Filter paper written by -Steven McCanne and Van Jacobson of Lawrence Berkeley Laboratory. + [1] Documentation/prctl/seccomp_filter.txt + +Original BPF paper: + +Steven McCanne and Van Jacobson. 1993. The BSD packet filter: a new +architecture for user-level packet capture. In Proceedings of the +USENIX Winter 1993 Conference Proceedings on USENIX Winter 1993 +Conference Proceedings (USENIX'93). USENIX Association, Berkeley, +CA, USA, 2-2. [http://www.tcpdump.org/papers/bpf-usenix93.pdf] + +Structure +--------- + +User space applications include which contains the +following relevant structures: + +struct sock_filter { /* Filter block */ + __u16 code; /* Actual filter code */ + __u8 jt; /* Jump true */ + __u8 jf; /* Jump false */ + __u32 k; /* Generic multiuse field */ +}; + +Such a structure is assembled as an array of 4-tuples, that contains +a code, jt, jf and k value. jt and jf are jump offsets and k a generic +value to be used for a provided code. + +struct sock_fprog { /* Required for SO_ATTACH_FILTER. */ + unsigned short len; /* Number of filter blocks */ + struct sock_filter __user *filter; +}; + +For socket filtering, a pointer to this structure (as shown in +follow-up example) is being passed to the kernel through setsockopt(2). + +Example +------- + +#include +#include +#include +#include +/* ... */ + +/* From the example above: tcpdump -i em1 port 22 -dd */ +struct sock_filter code[] = { + { 0x28, 0, 0, 0x0000000c }, + { 0x15, 0, 8, 0x000086dd }, + { 0x30, 0, 0, 0x00000014 }, + { 0x15, 2, 0, 0x00000084 }, + { 0x15, 1, 0, 0x00000006 }, + { 0x15, 0, 17, 0x00000011 }, + { 0x28, 0, 0, 0x00000036 }, + { 0x15, 14, 0, 0x00000016 }, + { 0x28, 0, 0, 0x00000038 }, + { 0x15, 12, 13, 0x00000016 }, + { 0x15, 0, 12, 0x00000800 }, + { 0x30, 0, 0, 0x00000017 }, + { 0x15, 2, 0, 0x00000084 }, + { 0x15, 1, 0, 0x00000006 }, + { 0x15, 0, 8, 0x00000011 }, + { 0x28, 0, 0, 0x00000014 }, + { 0x45, 6, 0, 0x00001fff }, + { 0xb1, 0, 0, 0x0000000e }, + { 0x48, 0, 0, 0x0000000e }, + { 0x15, 2, 0, 0x00000016 }, + { 0x48, 0, 0, 0x00000010 }, + { 0x15, 0, 1, 0x00000016 }, + { 0x06, 0, 0, 0x0000ffff }, + { 0x06, 0, 0, 0x00000000 }, +}; + +struct sock_fprog bpf = { + .len = ARRAY_SIZE(code), + .filter = code, +}; + +sock = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_ALL)); +if (sock < 0) + /* ... bail out ... */ + +ret = setsockopt(sock, SOL_SOCKET, SO_ATTACH_FILTER, &bpf, sizeof(bpf)); +if (ret < 0) + /* ... bail out ... */ + +/* ... */ +close(sock); + +The above example code attaches a socket filter for a PF_PACKET socket +in order to let all IPv4/IPv6 packets with port 22 pass. The rest will +be dropped for this socket. + +The setsockopt(2) call to SO_DETACH_FILTER doesn't need any arguments +and SO_LOCK_FILTER for preventing the filter to be detached, takes an +integer value with 0 or 1. + +Note that socket filters are not restricted to PF_PACKET sockets only, +but can also be used on other socket families. + +Summary of system calls: + + * setsockopt(sockfd, SOL_SOCKET, SO_ATTACH_FILTER, &val, sizeof(val)); + * setsockopt(sockfd, SOL_SOCKET, SO_DETACH_FILTER, &val, sizeof(val)); + * setsockopt(sockfd, SOL_SOCKET, SO_LOCK_FILTER, &val, sizeof(val)); + +Normally, most use cases for socket filtering on packet sockets will be +covered by libpcap in high-level syntax, so as an application developer +you should stick to that. libpcap wraps its own layer around all that. + +Unless i) using/linking to libpcap is not an option, ii) the required BPF +filters use Linux extensions that are not supported by libpcap's compiler, +iii) a filter might be more complex and not cleanly implementable with +libpcap's compiler, or iv) particular filter codes should be optimized +differently than libpcap's internal compiler does; then in such cases +writing such a filter "by hand" can be of an alternative. For example, +xt_bpf and cls_bpf users might have requirements that could result in +more complex filter code, or one that cannot be expressed with libpcap +(e.g. different return codes for various code paths). Moreover, BPF JIT +implementors may wish to manually write test cases and thus need low-level +access to BPF code as well. + +BPF engine and instruction set +------------------------------ + +Under tools/net/ there's a small helper tool called bpf_asm which can +be used to write low-level filters for example scenarios mentioned in the +previous section. Asm-like syntax mentioned here has been implemented in +bpf_asm and will be used for further explanations (instead of dealing with +less readable opcodes directly, principles are the same). The syntax is +closely modelled after Steven McCanne's and Van Jacobson's BPF paper. + +The BPF architecture consists of the following basic elements: + + Element Description + + A 32 bit wide accumulator + X 32 bit wide X register + M[] 16 x 32 bit wide misc registers aka "scratch memory + store", addressable from 0 to 15 + +A program, that is translated by bpf_asm into "opcodes" is an array that +consists of the following elements (as already mentioned): + + op:16, jt:8, jf:8, k:32 + +The element op is a 16 bit wide opcode that has a particular instruction +encoded. jt and jf are two 8 bit wide jump targets, one for condition +"jump if true", the other one "jump if false". Eventually, element k +contains a miscellaneous argument that can be interpreted in different +ways depending on the given instruction in op. + +The instruction set consists of load, store, branch, alu, miscellaneous +and return instructions that are also represented in bpf_asm syntax. This +table lists all bpf_asm instructions available resp. what their underlying +opcodes as defined in linux/filter.h stand for: + + Instruction Addressing mode Description + + ld 1, 2, 3, 4, 10 Load word into A + ldi 4 Load word into A + ldh 1, 2 Load half-word into A + ldb 1, 2 Load byte into A + ldx 3, 4, 5, 10 Load word into X + ldxi 4 Load word into X + ldxb 5 Load byte into X + + st 3 Store A into M[] + stx 3 Store X into M[] + + jmp 6 Jump to label + ja 6 Jump to label + jeq 7, 8 Jump on k == A + jneq 8 Jump on k != A + jne 8 Jump on k != A + jlt 8 Jump on k < A + jle 8 Jump on k <= A + jgt 7, 8 Jump on k > A + jge 7, 8 Jump on k >= A + jset 7, 8 Jump on k & A + + add 0, 4 A + + sub 0, 4 A - + mul 0, 4 A * + div 0, 4 A / + mod 0, 4 A % + neg 0, 4 !A + and 0, 4 A & + or 0, 4 A | + xor 0, 4 A ^ + lsh 0, 4 A << + rsh 0, 4 A >> + + tax Copy A into X + txa Copy X into A + + ret 4, 9 Return + +The next table shows addressing formats from the 2nd column: + + Addressing mode Syntax Description + + 0 x/%x Register X + 1 [k] BHW at byte offset k in the packet + 2 [x + k] BHW at the offset X + k in the packet + 3 M[k] Word at offset k in M[] + 4 #k Literal value stored in k + 5 4*([k]&0xf) Lower nibble * 4 at byte offset k in the packet + 6 L Jump label L + 7 #k,Lt,Lf Jump to Lt if true, otherwise jump to Lf + 8 #k,Lt Jump to Lt if predicate is true + 9 a/%a Accumulator A + 10 extension BPF extension + +The Linux kernel also has a couple of BPF extensions that are used along +with the class of load instructions by "overloading" the k argument with +a negative offset + a particular extension offset. The result of such BPF +extensions are loaded into A. + +Possible BPF extensions are shown in the following table: + + Extension Description + + len skb->len + proto skb->protocol + type skb->pkt_type + poff Payload start offset + ifidx skb->dev->ifindex + nla Netlink attribute of type X with offset A + nlan Nested Netlink attribute of type X with offset A + mark skb->mark + queue skb->queue_mapping + hatype skb->dev->type + rxhash skb->rxhash + cpu raw_smp_processor_id() + vlan_tci vlan_tx_tag_get(skb) + vlan_pr vlan_tx_tag_present(skb) + +These extensions can also be prefixed with '#'. +Examples for low-level BPF: + +** ARP packets: + + ldh [12] + jne #0x806, drop + ret #-1 + drop: ret #0 + +** IPv4 TCP packets: + + ldh [12] + jne #0x800, drop + ldb [23] + jneq #6, drop + ret #-1 + drop: ret #0 + +** (Accelerated) VLAN w/ id 10: + + ld vlan_tci + jneq #10, drop + ret #-1 + drop: ret #0 + +** SECCOMP filter example: + + ld [4] /* offsetof(struct seccomp_data, arch) */ + jne #0xc000003e, bad /* AUDIT_ARCH_X86_64 */ + ld [0] /* offsetof(struct seccomp_data, nr) */ + jeq #15, good /* __NR_rt_sigreturn */ + jeq #231, good /* __NR_exit_group */ + jeq #60, good /* __NR_exit */ + jeq #0, good /* __NR_read */ + jeq #1, good /* __NR_write */ + jeq #5, good /* __NR_fstat */ + jeq #9, good /* __NR_mmap */ + jeq #14, good /* __NR_rt_sigprocmask */ + jeq #13, good /* __NR_rt_sigaction */ + jeq #35, good /* __NR_nanosleep */ + bad: ret #0 /* SECCOMP_RET_KILL */ + good: ret #0x7fff0000 /* SECCOMP_RET_ALLOW */ + +The above example code can be placed into a file (here called "foo"), and +then be passed to the bpf_asm tool for generating opcodes, output that xt_bpf +and cls_bpf understands and can directly be loaded with. Example with above +ARP code: + +$ ./bpf_asm foo +4,40 0 0 12,21 0 1 2054,6 0 0 4294967295,6 0 0 0, + +In copy and paste C-like output: + +$ ./bpf_asm -c foo +{ 0x28, 0, 0, 0x0000000c }, +{ 0x15, 0, 1, 0x00000806 }, +{ 0x06, 0, 0, 0xffffffff }, +{ 0x06, 0, 0, 0000000000 }, + +In particular, as usage with xt_bpf or cls_bpf can result in more complex BPF +filters that might not be obvious at first, it's good to test filters before +attaching to a live system. For that purpose, there's a small tool called +bpf_dbg under tools/net/ in the kernel source directory. This debugger allows +for testing BPF filters against given pcap files, single stepping through the +BPF code on the pcap's packets and to do BPF machine register dumps. + +Starting bpf_dbg is trivial and just requires issuing: + +# ./bpf_dbg + +In case input and output do not equal stdin/stdout, bpf_dbg takes an +alternative stdin source as a first argument, and an alternative stdout +sink as a second one, e.g. `./bpf_dbg test_in.txt test_out.txt`. + +Other than that, a particular libreadline configuration can be set via +file "~/.bpf_dbg_init" and the command history is stored in the file +"~/.bpf_dbg_history". + +Interaction in bpf_dbg happens through a shell that also has auto-completion +support (follow-up example commands starting with '>' denote bpf_dbg shell). +The usual workflow would be to ... + +> load bpf 6,40 0 0 12,21 0 3 2048,48 0 0 23,21 0 1 1,6 0 0 65535,6 0 0 0 + Loads a BPF filter from standard output of bpf_asm, or transformed via + e.g. `tcpdump -iem1 -ddd port 22 | tr '\n' ','`. Note that for JIT + debugging (next section), this command creates a temporary socket and + loads the BPF code into the kernel. Thus, this will also be useful for + JIT developers. + +> load pcap foo.pcap + Loads standard tcpdump pcap file. + +> run [] +bpf passes:1 fails:9 + Runs through all packets from a pcap to account how many passes and fails + the filter will generate. A limit of packets to traverse can be given. + +> disassemble +l0: ldh [12] +l1: jeq #0x800, l2, l5 +l2: ldb [23] +l3: jeq #0x1, l4, l5 +l4: ret #0xffff +l5: ret #0 + Prints out BPF code disassembly. + +> dump +/* { op, jt, jf, k }, */ +{ 0x28, 0, 0, 0x0000000c }, +{ 0x15, 0, 3, 0x00000800 }, +{ 0x30, 0, 0, 0x00000017 }, +{ 0x15, 0, 1, 0x00000001 }, +{ 0x06, 0, 0, 0x0000ffff }, +{ 0x06, 0, 0, 0000000000 }, + Prints out C-style BPF code dump. + +> breakpoint 0 +breakpoint at: l0: ldh [12] +> breakpoint 1 +breakpoint at: l1: jeq #0x800, l2, l5 + ... + Sets breakpoints at particular BPF instructions. Issuing a `run` command + will walk through the pcap file continuing from the current packet and + break when a breakpoint is being hit (another `run` will continue from + the currently active breakpoint executing next instructions): + + > run + -- register dump -- + pc: [0] <-- program counter + code: [40] jt[0] jf[0] k[12] <-- plain BPF code of current instruction + curr: l0: ldh [12] <-- disassembly of current instruction + A: [00000000][0] <-- content of A (hex, decimal) + X: [00000000][0] <-- content of X (hex, decimal) + M[0,15]: [00000000][0] <-- folded content of M (hex, decimal) + -- packet dump -- <-- Current packet from pcap (hex) + len: 42 + 0: 00 19 cb 55 55 a4 00 14 a4 43 78 69 08 06 00 01 + 16: 08 00 06 04 00 01 00 14 a4 43 78 69 0a 3b 01 26 + 32: 00 00 00 00 00 00 0a 3b 01 01 + (breakpoint) + > + +> breakpoint +breakpoints: 0 1 + Prints currently set breakpoints. + +> step [-, +] + Performs single stepping through the BPF program from the current pc + offset. Thus, on each step invocation, above register dump is issued. + This can go forwards and backwards in time, a plain `step` will break + on the next BPF instruction, thus +1. (No `run` needs to be issued here.) + +> select + Selects a given packet from the pcap file to continue from. Thus, on + the next `run` or `step`, the BPF program is being evaluated against + the user pre-selected packet. Numbering starts just as in Wireshark + with index 1. + +> quit +# + Exits bpf_dbg. + +JIT compiler +------------ + +The Linux kernel has a built-in BPF JIT compiler for x86_64, SPARC, PowerPC, +ARM and s390 and can be enabled through CONFIG_BPF_JIT. The JIT compiler is +transparently invoked for each attached filter from user space or for internal +kernel users if it has been previously enabled by root: + + echo 1 > /proc/sys/net/core/bpf_jit_enable + +For JIT developers, doing audits etc, each compile run can output the generated +opcode image into the kernel log via: + + echo 2 > /proc/sys/net/core/bpf_jit_enable + +Example output from dmesg: + +[ 3389.935842] flen=6 proglen=70 pass=3 image=ffffffffa0069c8f +[ 3389.935847] JIT code: 00000000: 55 48 89 e5 48 83 ec 60 48 89 5d f8 44 8b 4f 68 +[ 3389.935849] JIT code: 00000010: 44 2b 4f 6c 4c 8b 87 d8 00 00 00 be 0c 00 00 00 +[ 3389.935850] JIT code: 00000020: e8 1d 94 ff e0 3d 00 08 00 00 75 16 be 17 00 00 +[ 3389.935851] JIT code: 00000030: 00 e8 28 94 ff e0 83 f8 01 75 07 b8 ff ff 00 00 +[ 3389.935852] JIT code: 00000040: eb 02 31 c0 c9 c3 + +In the kernel source tree under tools/net/, there's bpf_jit_disasm for +generating disassembly out of the kernel log's hexdump: + +# ./bpf_jit_disasm +70 bytes emitted from JIT compiler (pass:3, flen:6) +ffffffffa0069c8f + : + 0: push %rbp + 1: mov %rsp,%rbp + 4: sub $0x60,%rsp + 8: mov %rbx,-0x8(%rbp) + c: mov 0x68(%rdi),%r9d + 10: sub 0x6c(%rdi),%r9d + 14: mov 0xd8(%rdi),%r8 + 1b: mov $0xc,%esi + 20: callq 0xffffffffe0ff9442 + 25: cmp $0x800,%eax + 2a: jne 0x0000000000000042 + 2c: mov $0x17,%esi + 31: callq 0xffffffffe0ff945e + 36: cmp $0x1,%eax + 39: jne 0x0000000000000042 + 3b: mov $0xffff,%eax + 40: jmp 0x0000000000000044 + 42: xor %eax,%eax + 44: leaveq + 45: retq + +Issuing option `-o` will "annotate" opcodes to resulting assembler +instructions, which can be very useful for JIT developers: + +# ./bpf_jit_disasm -o +70 bytes emitted from JIT compiler (pass:3, flen:6) +ffffffffa0069c8f + : + 0: push %rbp + 55 + 1: mov %rsp,%rbp + 48 89 e5 + 4: sub $0x60,%rsp + 48 83 ec 60 + 8: mov %rbx,-0x8(%rbp) + 48 89 5d f8 + c: mov 0x68(%rdi),%r9d + 44 8b 4f 68 + 10: sub 0x6c(%rdi),%r9d + 44 2b 4f 6c + 14: mov 0xd8(%rdi),%r8 + 4c 8b 87 d8 00 00 00 + 1b: mov $0xc,%esi + be 0c 00 00 00 + 20: callq 0xffffffffe0ff9442 + e8 1d 94 ff e0 + 25: cmp $0x800,%eax + 3d 00 08 00 00 + 2a: jne 0x0000000000000042 + 75 16 + 2c: mov $0x17,%esi + be 17 00 00 00 + 31: callq 0xffffffffe0ff945e + e8 28 94 ff e0 + 36: cmp $0x1,%eax + 83 f8 01 + 39: jne 0x0000000000000042 + 75 07 + 3b: mov $0xffff,%eax + b8 ff ff 00 00 + 40: jmp 0x0000000000000044 + eb 02 + 42: xor %eax,%eax + 31 c0 + 44: leaveq + c9 + 45: retq + c3 + +For BPF JIT developers, bpf_jit_disasm, bpf_asm and bpf_dbg provides a useful +toolchain for developing and testing the kernel's JIT compiler. + +Misc +---- + +Also trinity, the Linux syscall fuzzer, has built-in support for BPF and +SECCOMP-BPF kernel fuzzing. + +Written by +---------- + +The document was written in the hope that it is found useful and in order +to give potential BPF hackers or security auditors a better overview of +the underlying architecture. + +Jay Schulist +Daniel Borkmann diff --git a/tools/net/Makefile b/tools/net/Makefile index b4444d53b73f..004cd74734b6 100644 --- a/tools/net/Makefile +++ b/tools/net/Makefile @@ -1,15 +1,34 @@ prefix = /usr CC = gcc +LEX = flex +YACC = bison -all : bpf_jit_disasm +%.yacc.c: %.y + $(YACC) -o $@ -d $< + +%.lex.c: %.l + $(LEX) -o $@ $< + +all : bpf_jit_disasm bpf_dbg bpf_asm bpf_jit_disasm : CFLAGS = -Wall -O2 bpf_jit_disasm : LDLIBS = -lopcodes -lbfd -ldl bpf_jit_disasm : bpf_jit_disasm.o +bpf_dbg : CFLAGS = -Wall -O2 +bpf_dbg : LDLIBS = -lreadline +bpf_dbg : bpf_dbg.o + +bpf_asm : CFLAGS = -Wall -O2 -I. +bpf_asm : LDLIBS = +bpf_asm : bpf_asm.o bpf_exp.yacc.o bpf_exp.lex.o +bpf_exp.lex.o : bpf_exp.yacc.c + clean : - rm -rf *.o bpf_jit_disasm + rm -rf *.o bpf_jit_disasm bpf_dbg bpf_asm bpf_exp.yacc.* bpf_exp.lex.* install : install bpf_jit_disasm $(prefix)/bin/bpf_jit_disasm + install bpf_dbg $(prefix)/bin/bpf_dbg + install bpf_asm $(prefix)/bin/bpf_asm diff --git a/tools/net/bpf_asm.c b/tools/net/bpf_asm.c new file mode 100644 index 000000000000..c15aef097b04 --- /dev/null +++ b/tools/net/bpf_asm.c @@ -0,0 +1,52 @@ +/* + * Minimal BPF assembler + * + * Instead of libpcap high-level filter expressions, it can be quite + * useful to define filters in low-level BPF assembler (that is kept + * close to Steven McCanne and Van Jacobson's original BPF paper). + * In particular for BPF JIT implementors, JIT security auditors, or + * just for defining BPF expressions that contain extensions which are + * not supported by compilers. + * + * How to get into it: + * + * 1) read Documentation/networking/filter.txt + * 2) Run `bpf_asm [-c] ` to translate into binary + * blob that is loadable with xt_bpf, cls_bpf et al. Note: -c will + * pretty print a C-like construct. + * + * Copyright 2013 Daniel Borkmann + * Licensed under the GNU General Public License, version 2.0 (GPLv2) + */ + +#include +#include +#include + +extern void bpf_asm_compile(FILE *fp, bool cstyle); + +int main(int argc, char **argv) +{ + FILE *fp = stdin; + bool cstyle = false; + int i; + + for (i = 1; i < argc; i++) { + if (!strncmp("-c", argv[i], 2)) { + cstyle = true; + continue; + } + + fp = fopen(argv[i], "r"); + if (!fp) { + fp = stdin; + continue; + } + + break; + } + + bpf_asm_compile(fp, cstyle); + + return 0; +} diff --git a/tools/net/bpf_dbg.c b/tools/net/bpf_dbg.c new file mode 100644 index 000000000000..0fdcb707a2e7 --- /dev/null +++ b/tools/net/bpf_dbg.c @@ -0,0 +1,1404 @@ +/* + * Minimal BPF debugger + * + * Minimal BPF debugger that mimics the kernel's engine (w/o extensions) + * and allows for single stepping through selected packets from a pcap + * with a provided user filter in order to facilitate verification of a + * BPF program. Besides others, this is useful to verify BPF programs + * before attaching to a live system, and can be used in socket filters, + * cls_bpf, xt_bpf, team driver and e.g. PTP code; in particular when a + * single more complex BPF program is being used. Reasons for a more + * complex BPF program are likely primarily to optimize execution time + * for making a verdict when multiple simple BPF programs are combined + * into one in order to prevent parsing same headers multiple times. + * + * More on how to debug BPF opcodes see Documentation/networking/filter.txt + * which is the main document on BPF. Mini howto for getting started: + * + * 1) `./bpf_dbg` to enter the shell (shell cmds denoted with '>'): + * 2) > load bpf 6,40 0 0 12,21 0 3 20... (output from `bpf_asm` or + * `tcpdump -iem1 -ddd port 22 | tr '\n' ','` to load as filter) + * 3) > load pcap foo.pcap + * 4) > run /disassemble/dump/quit (self-explanatory) + * 5) > breakpoint 2 (sets bp at loaded BPF insns 2, do `run` then; + * multiple bps can be set, of course, a call to `breakpoint` + * w/o args shows currently loaded bps, `breakpoint reset` for + * resetting all breakpoints) + * 6) > select 3 (`run` etc will start from the 3rd packet in the pcap) + * 7) > step [-, +] (performs single stepping through the BPF) + * + * Copyright 2013 Daniel Borkmann + * Licensed under the GNU General Public License, version 2.0 (GPLv2) + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define TCPDUMP_MAGIC 0xa1b2c3d4 + +#define BPF_LDX_B (BPF_LDX | BPF_B) +#define BPF_LDX_W (BPF_LDX | BPF_W) +#define BPF_JMP_JA (BPF_JMP | BPF_JA) +#define BPF_JMP_JEQ (BPF_JMP | BPF_JEQ) +#define BPF_JMP_JGT (BPF_JMP | BPF_JGT) +#define BPF_JMP_JGE (BPF_JMP | BPF_JGE) +#define BPF_JMP_JSET (BPF_JMP | BPF_JSET) +#define BPF_ALU_ADD (BPF_ALU | BPF_ADD) +#define BPF_ALU_SUB (BPF_ALU | BPF_SUB) +#define BPF_ALU_MUL (BPF_ALU | BPF_MUL) +#define BPF_ALU_DIV (BPF_ALU | BPF_DIV) +#define BPF_ALU_MOD (BPF_ALU | BPF_MOD) +#define BPF_ALU_NEG (BPF_ALU | BPF_NEG) +#define BPF_ALU_AND (BPF_ALU | BPF_AND) +#define BPF_ALU_OR (BPF_ALU | BPF_OR) +#define BPF_ALU_XOR (BPF_ALU | BPF_XOR) +#define BPF_ALU_LSH (BPF_ALU | BPF_LSH) +#define BPF_ALU_RSH (BPF_ALU | BPF_RSH) +#define BPF_MISC_TAX (BPF_MISC | BPF_TAX) +#define BPF_MISC_TXA (BPF_MISC | BPF_TXA) +#define BPF_LD_B (BPF_LD | BPF_B) +#define BPF_LD_H (BPF_LD | BPF_H) +#define BPF_LD_W (BPF_LD | BPF_W) + +#ifndef array_size +# define array_size(x) (sizeof(x) / sizeof((x)[0])) +#endif + +#ifndef __check_format_printf +# define __check_format_printf(pos_fmtstr, pos_fmtargs) \ + __attribute__ ((format (printf, (pos_fmtstr), (pos_fmtargs)))) +#endif + +#define CMD(_name, _func) { .name = _name, .func = _func, } +#define OP(_op, _name) [_op] = _name + +enum { + CMD_OK, + CMD_ERR, + CMD_EX, +}; + +struct shell_cmd { + const char *name; + int (*func)(char *args); +}; + +struct pcap_filehdr { + uint32_t magic; + uint16_t version_major; + uint16_t version_minor; + int32_t thiszone; + uint32_t sigfigs; + uint32_t snaplen; + uint32_t linktype; +}; + +struct pcap_timeval { + int32_t tv_sec; + int32_t tv_usec; +}; + +struct pcap_pkthdr { + struct pcap_timeval ts; + uint32_t caplen; + uint32_t len; +}; + +struct bpf_regs { + uint32_t A; + uint32_t X; + uint32_t M[BPF_MEMWORDS]; + uint32_t R; + bool Rs; + uint16_t Pc; +}; + +static struct sock_filter bpf_image[BPF_MAXINSNS + 1]; +static unsigned int bpf_prog_len = 0; + +static int bpf_breakpoints[64]; +static struct bpf_regs bpf_regs[BPF_MAXINSNS + 1]; +static struct bpf_regs bpf_curr; +static unsigned int bpf_regs_len = 0; + +static int pcap_fd = -1; +static unsigned int pcap_packet = 0; +static size_t pcap_map_size = 0; +static char *pcap_ptr_va_start, *pcap_ptr_va_curr; + +static const char * const op_table[] = { + OP(BPF_ST, "st"), + OP(BPF_STX, "stx"), + OP(BPF_LD_B, "ldb"), + OP(BPF_LD_H, "ldh"), + OP(BPF_LD_W, "ld"), + OP(BPF_LDX, "ldx"), + OP(BPF_LDX_B, "ldxb"), + OP(BPF_JMP_JA, "ja"), + OP(BPF_JMP_JEQ, "jeq"), + OP(BPF_JMP_JGT, "jgt"), + OP(BPF_JMP_JGE, "jge"), + OP(BPF_JMP_JSET, "jset"), + OP(BPF_ALU_ADD, "add"), + OP(BPF_ALU_SUB, "sub"), + OP(BPF_ALU_MUL, "mul"), + OP(BPF_ALU_DIV, "div"), + OP(BPF_ALU_MOD, "mod"), + OP(BPF_ALU_NEG, "neg"), + OP(BPF_ALU_AND, "and"), + OP(BPF_ALU_OR, "or"), + OP(BPF_ALU_XOR, "xor"), + OP(BPF_ALU_LSH, "lsh"), + OP(BPF_ALU_RSH, "rsh"), + OP(BPF_MISC_TAX, "tax"), + OP(BPF_MISC_TXA, "txa"), + OP(BPF_RET, "ret"), +}; + +static __check_format_printf(1, 2) int rl_printf(const char *fmt, ...) +{ + int ret; + va_list vl; + + va_start(vl, fmt); + ret = vfprintf(rl_outstream, fmt, vl); + va_end(vl); + + return ret; +} + +static int matches(const char *cmd, const char *pattern) +{ + int len = strlen(cmd); + + if (len > strlen(pattern)) + return -1; + + return memcmp(pattern, cmd, len); +} + +static void hex_dump(const uint8_t *buf, size_t len) +{ + int i; + + rl_printf("%3u: ", 0); + for (i = 0; i < len; i++) { + if (i && !(i % 16)) + rl_printf("\n%3u: ", i); + rl_printf("%02x ", buf[i]); + } + rl_printf("\n"); +} + +static bool bpf_prog_loaded(void) +{ + if (bpf_prog_len == 0) + rl_printf("no bpf program loaded!\n"); + + return bpf_prog_len > 0; +} + +static void bpf_disasm(const struct sock_filter f, unsigned int i) +{ + const char *op, *fmt; + int val = f.k; + char buf[256]; + + switch (f.code) { + case BPF_RET | BPF_K: + op = op_table[BPF_RET]; + fmt = "#%#x"; + break; + case BPF_RET | BPF_A: + op = op_table[BPF_RET]; + fmt = "a"; + break; + case BPF_RET | BPF_X: + op = op_table[BPF_RET]; + fmt = "x"; + break; + case BPF_MISC_TAX: + op = op_table[BPF_MISC_TAX]; + fmt = ""; + break; + case BPF_MISC_TXA: + op = op_table[BPF_MISC_TXA]; + fmt = ""; + break; + case BPF_ST: + op = op_table[BPF_ST]; + fmt = "M[%d]"; + break; + case BPF_STX: + op = op_table[BPF_STX]; + fmt = "M[%d]"; + break; + case BPF_LD_W | BPF_ABS: + op = op_table[BPF_LD_W]; + fmt = "[%d]"; + break; + case BPF_LD_H | BPF_ABS: + op = op_table[BPF_LD_H]; + fmt = "[%d]"; + break; + case BPF_LD_B | BPF_ABS: + op = op_table[BPF_LD_B]; + fmt = "[%d]"; + break; + case BPF_LD_W | BPF_LEN: + op = op_table[BPF_LD_W]; + fmt = "#len"; + break; + case BPF_LD_W | BPF_IND: + op = op_table[BPF_LD_W]; + fmt = "[x+%d]"; + break; + case BPF_LD_H | BPF_IND: + op = op_table[BPF_LD_H]; + fmt = "[x+%d]"; + break; + case BPF_LD_B | BPF_IND: + op = op_table[BPF_LD_B]; + fmt = "[x+%d]"; + break; + case BPF_LD | BPF_IMM: + op = op_table[BPF_LD_W]; + fmt = "#%#x"; + break; + case BPF_LDX | BPF_IMM: + op = op_table[BPF_LDX]; + fmt = "#%#x"; + break; + case BPF_LDX_B | BPF_MSH: + op = op_table[BPF_LDX_B]; + fmt = "4*([%d]&0xf)"; + break; + case BPF_LD | BPF_MEM: + op = op_table[BPF_LD_W]; + fmt = "M[%d]"; + break; + case BPF_LDX | BPF_MEM: + op = op_table[BPF_LDX]; + fmt = "M[%d]"; + break; + case BPF_JMP_JA: + op = op_table[BPF_JMP_JA]; + fmt = "%d"; + val = i + 1 + f.k; + break; + case BPF_JMP_JGT | BPF_X: + op = op_table[BPF_JMP_JGT]; + fmt = "x"; + break; + case BPF_JMP_JGT | BPF_K: + op = op_table[BPF_JMP_JGT]; + fmt = "#%#x"; + break; + case BPF_JMP_JGE | BPF_X: + op = op_table[BPF_JMP_JGE]; + fmt = "x"; + break; + case BPF_JMP_JGE | BPF_K: + op = op_table[BPF_JMP_JGE]; + fmt = "#%#x"; + break; + case BPF_JMP_JEQ | BPF_X: + op = op_table[BPF_JMP_JEQ]; + fmt = "x"; + break; + case BPF_JMP_JEQ | BPF_K: + op = op_table[BPF_JMP_JEQ]; + fmt = "#%#x"; + break; + case BPF_JMP_JSET | BPF_X: + op = op_table[BPF_JMP_JSET]; + fmt = "x"; + break; + case BPF_JMP_JSET | BPF_K: + op = op_table[BPF_JMP_JSET]; + fmt = "#%#x"; + break; + case BPF_ALU_NEG: + op = op_table[BPF_ALU_NEG]; + fmt = ""; + break; + case BPF_ALU_LSH | BPF_X: + op = op_table[BPF_ALU_LSH]; + fmt = "x"; + break; + case BPF_ALU_LSH | BPF_K: + op = op_table[BPF_ALU_LSH]; + fmt = "#%d"; + break; + case BPF_ALU_RSH | BPF_X: + op = op_table[BPF_ALU_RSH]; + fmt = "x"; + break; + case BPF_ALU_RSH | BPF_K: + op = op_table[BPF_ALU_RSH]; + fmt = "#%d"; + break; + case BPF_ALU_ADD | BPF_X: + op = op_table[BPF_ALU_ADD]; + fmt = "x"; + break; + case BPF_ALU_ADD | BPF_K: + op = op_table[BPF_ALU_ADD]; + fmt = "#%d"; + break; + case BPF_ALU_SUB | BPF_X: + op = op_table[BPF_ALU_SUB]; + fmt = "x"; + break; + case BPF_ALU_SUB | BPF_K: + op = op_table[BPF_ALU_SUB]; + fmt = "#%d"; + break; + case BPF_ALU_MUL | BPF_X: + op = op_table[BPF_ALU_MUL]; + fmt = "x"; + break; + case BPF_ALU_MUL | BPF_K: + op = op_table[BPF_ALU_MUL]; + fmt = "#%d"; + break; + case BPF_ALU_DIV | BPF_X: + op = op_table[BPF_ALU_DIV]; + fmt = "x"; + break; + case BPF_ALU_DIV | BPF_K: + op = op_table[BPF_ALU_DIV]; + fmt = "#%d"; + break; + case BPF_ALU_MOD | BPF_X: + op = op_table[BPF_ALU_MOD]; + fmt = "x"; + break; + case BPF_ALU_MOD | BPF_K: + op = op_table[BPF_ALU_MOD]; + fmt = "#%d"; + break; + case BPF_ALU_AND | BPF_X: + op = op_table[BPF_ALU_AND]; + fmt = "x"; + break; + case BPF_ALU_AND | BPF_K: + op = op_table[BPF_ALU_AND]; + fmt = "#%#x"; + break; + case BPF_ALU_OR | BPF_X: + op = op_table[BPF_ALU_OR]; + fmt = "x"; + break; + case BPF_ALU_OR | BPF_K: + op = op_table[BPF_ALU_OR]; + fmt = "#%#x"; + break; + case BPF_ALU_XOR | BPF_X: + op = op_table[BPF_ALU_XOR]; + fmt = "x"; + break; + case BPF_ALU_XOR | BPF_K: + op = op_table[BPF_ALU_XOR]; + fmt = "#%#x"; + break; + default: + op = "nosup"; + fmt = "%#x"; + val = f.code; + break; + } + + memset(buf, 0, sizeof(buf)); + snprintf(buf, sizeof(buf), fmt, val); + buf[sizeof(buf) - 1] = 0; + + if ((BPF_CLASS(f.code) == BPF_JMP && BPF_OP(f.code) != BPF_JA)) + rl_printf("l%d:\t%s %s, l%d, l%d\n", i, op, buf, + i + 1 + f.jt, i + 1 + f.jf); + else + rl_printf("l%d:\t%s %s\n", i, op, buf); +} + +static void bpf_dump_curr(struct bpf_regs *r, struct sock_filter *f) +{ + int i, m = 0; + + rl_printf("pc: [%u]\n", r->Pc); + rl_printf("code: [%u] jt[%u] jf[%u] k[%u]\n", + f->code, f->jt, f->jf, f->k); + rl_printf("curr: "); + bpf_disasm(*f, r->Pc); + + if (f->jt || f->jf) { + rl_printf("jt: "); + bpf_disasm(*(f + f->jt + 1), r->Pc + f->jt + 1); + rl_printf("jf: "); + bpf_disasm(*(f + f->jf + 1), r->Pc + f->jf + 1); + } + + rl_printf("A: [%#08x][%u]\n", r->A, r->A); + rl_printf("X: [%#08x][%u]\n", r->X, r->X); + if (r->Rs) + rl_printf("ret: [%#08x][%u]!\n", r->R, r->R); + + for (i = 0; i < BPF_MEMWORDS; i++) { + if (r->M[i]) { + m++; + rl_printf("M[%d]: [%#08x][%u]\n", i, r->M[i], r->M[i]); + } + } + if (m == 0) + rl_printf("M[0,%d]: [%#08x][%u]\n", BPF_MEMWORDS - 1, 0, 0); +} + +static void bpf_dump_pkt(uint8_t *pkt, uint32_t pkt_caplen, uint32_t pkt_len) +{ + if (pkt_caplen != pkt_len) + rl_printf("cap: %u, len: %u\n", pkt_caplen, pkt_len); + else + rl_printf("len: %u\n", pkt_len); + + hex_dump(pkt, pkt_caplen); +} + +static void bpf_disasm_all(const struct sock_filter *f, unsigned int len) +{ + unsigned int i; + + for (i = 0; i < len; i++) + bpf_disasm(f[i], i); +} + +static void bpf_dump_all(const struct sock_filter *f, unsigned int len) +{ + unsigned int i; + + rl_printf("/* { op, jt, jf, k }, */\n"); + for (i = 0; i < len; i++) + rl_printf("{ %#04x, %2u, %2u, %#010x },\n", + f[i].code, f[i].jt, f[i].jf, f[i].k); +} + +static bool bpf_runnable(struct sock_filter *f, unsigned int len) +{ + int sock, ret, i; + struct sock_fprog bpf = { + .filter = f, + .len = len, + }; + + sock = socket(AF_INET, SOCK_DGRAM, 0); + if (sock < 0) { + rl_printf("cannot open socket!\n"); + return false; + } + ret = setsockopt(sock, SOL_SOCKET, SO_ATTACH_FILTER, &bpf, sizeof(bpf)); + if (ret < 0) { + rl_printf("program not allowed to run by kernel!\n"); + return false; + } + close(sock); + for (i = 0; i < len; i++) { + if (BPF_CLASS(f[i].code) == BPF_LD && + f[i].k > SKF_AD_OFF) { + rl_printf("extensions currently not supported!\n"); + return false; + } + } + + return true; +} + +static void bpf_reset_breakpoints(void) +{ + int i; + + for (i = 0; i < array_size(bpf_breakpoints); i++) + bpf_breakpoints[i] = -1; +} + +static void bpf_set_breakpoints(unsigned int where) +{ + int i; + bool set = false; + + for (i = 0; i < array_size(bpf_breakpoints); i++) { + if (bpf_breakpoints[i] == (int) where) { + rl_printf("breakpoint already set!\n"); + set = true; + break; + } + + if (bpf_breakpoints[i] == -1 && set == false) { + bpf_breakpoints[i] = where; + set = true; + } + } + + if (!set) + rl_printf("too many breakpoints set, reset first!\n"); +} + +static void bpf_dump_breakpoints(void) +{ + int i; + + rl_printf("breakpoints: "); + + for (i = 0; i < array_size(bpf_breakpoints); i++) { + if (bpf_breakpoints[i] < 0) + continue; + rl_printf("%d ", bpf_breakpoints[i]); + } + + rl_printf("\n"); +} + +static void bpf_reset(void) +{ + bpf_regs_len = 0; + + memset(bpf_regs, 0, sizeof(bpf_regs)); + memset(&bpf_curr, 0, sizeof(bpf_curr)); +} + +static void bpf_safe_regs(void) +{ + memcpy(&bpf_regs[bpf_regs_len++], &bpf_curr, sizeof(bpf_curr)); +} + +static bool bpf_restore_regs(int off) +{ + unsigned int index = bpf_regs_len - 1 + off; + + if (index == 0) { + bpf_reset(); + return true; + } else if (index < bpf_regs_len) { + memcpy(&bpf_curr, &bpf_regs[index], sizeof(bpf_curr)); + bpf_regs_len = index; + return true; + } else { + rl_printf("reached bottom of register history stack!\n"); + return false; + } +} + +static uint32_t extract_u32(uint8_t *pkt, uint32_t off) +{ + uint32_t r; + + memcpy(&r, &pkt[off], sizeof(r)); + + return ntohl(r); +} + +static uint16_t extract_u16(uint8_t *pkt, uint32_t off) +{ + uint16_t r; + + memcpy(&r, &pkt[off], sizeof(r)); + + return ntohs(r); +} + +static uint8_t extract_u8(uint8_t *pkt, uint32_t off) +{ + return pkt[off]; +} + +static void set_return(struct bpf_regs *r) +{ + r->R = 0; + r->Rs = true; +} + +static void bpf_single_step(struct bpf_regs *r, struct sock_filter *f, + uint8_t *pkt, uint32_t pkt_caplen, + uint32_t pkt_len) +{ + uint32_t K = f->k; + int d; + + switch (f->code) { + case BPF_RET | BPF_K: + r->R = K; + r->Rs = true; + break; + case BPF_RET | BPF_A: + r->R = r->A; + r->Rs = true; + break; + case BPF_RET | BPF_X: + r->R = r->X; + r->Rs = true; + break; + case BPF_MISC_TAX: + r->X = r->A; + break; + case BPF_MISC_TXA: + r->A = r->X; + break; + case BPF_ST: + r->M[K] = r->A; + break; + case BPF_STX: + r->M[K] = r->X; + break; + case BPF_LD_W | BPF_ABS: + d = pkt_caplen - K; + if (d >= sizeof(uint32_t)) + r->A = extract_u32(pkt, K); + else + set_return(r); + break; + case BPF_LD_H | BPF_ABS: + d = pkt_caplen - K; + if (d >= sizeof(uint16_t)) + r->A = extract_u16(pkt, K); + else + set_return(r); + break; + case BPF_LD_B | BPF_ABS: + d = pkt_caplen - K; + if (d >= sizeof(uint8_t)) + r->A = extract_u8(pkt, K); + else + set_return(r); + break; + case BPF_LD_W | BPF_IND: + d = pkt_caplen - (r->X + K); + if (d >= sizeof(uint32_t)) + r->A = extract_u32(pkt, r->X + K); + break; + case BPF_LD_H | BPF_IND: + d = pkt_caplen - (r->X + K); + if (d >= sizeof(uint16_t)) + r->A = extract_u16(pkt, r->X + K); + else + set_return(r); + break; + case BPF_LD_B | BPF_IND: + d = pkt_caplen - (r->X + K); + if (d >= sizeof(uint8_t)) + r->A = extract_u8(pkt, r->X + K); + else + set_return(r); + break; + case BPF_LDX_B | BPF_MSH: + d = pkt_caplen - K; + if (d >= sizeof(uint8_t)) { + r->X = extract_u8(pkt, K); + r->X = (r->X & 0xf) << 2; + } else + set_return(r); + break; + case BPF_LD_W | BPF_LEN: + r->A = pkt_len; + break; + case BPF_LDX_W | BPF_LEN: + r->A = pkt_len; + break; + case BPF_LD | BPF_IMM: + r->A = K; + break; + case BPF_LDX | BPF_IMM: + r->X = K; + break; + case BPF_LD | BPF_MEM: + r->A = r->M[K]; + break; + case BPF_LDX | BPF_MEM: + r->X = r->M[K]; + break; + case BPF_JMP_JA: + r->Pc += K; + break; + case BPF_JMP_JGT | BPF_X: + r->Pc += r->A > r->X ? f->jt : f->jf; + break; + case BPF_JMP_JGT | BPF_K: + r->Pc += r->A > K ? f->jt : f->jf; + break; + case BPF_JMP_JGE | BPF_X: + r->Pc += r->A >= r->X ? f->jt : f->jf; + break; + case BPF_JMP_JGE | BPF_K: + r->Pc += r->A >= K ? f->jt : f->jf; + break; + case BPF_JMP_JEQ | BPF_X: + r->Pc += r->A == r->X ? f->jt : f->jf; + break; + case BPF_JMP_JEQ | BPF_K: + r->Pc += r->A == K ? f->jt : f->jf; + break; + case BPF_JMP_JSET | BPF_X: + r->Pc += r->A & r->X ? f->jt : f->jf; + break; + case BPF_JMP_JSET | BPF_K: + r->Pc += r->A & K ? f->jt : f->jf; + break; + case BPF_ALU_NEG: + r->A = -r->A; + break; + case BPF_ALU_LSH | BPF_X: + r->A <<= r->X; + break; + case BPF_ALU_LSH | BPF_K: + r->A <<= K; + break; + case BPF_ALU_RSH | BPF_X: + r->A >>= r->X; + break; + case BPF_ALU_RSH | BPF_K: + r->A >>= K; + break; + case BPF_ALU_ADD | BPF_X: + r->A += r->X; + break; + case BPF_ALU_ADD | BPF_K: + r->A += K; + break; + case BPF_ALU_SUB | BPF_X: + r->A -= r->X; + break; + case BPF_ALU_SUB | BPF_K: + r->A -= K; + break; + case BPF_ALU_MUL | BPF_X: + r->A *= r->X; + break; + case BPF_ALU_MUL | BPF_K: + r->A *= K; + break; + case BPF_ALU_DIV | BPF_X: + case BPF_ALU_MOD | BPF_X: + if (r->X == 0) { + set_return(r); + break; + } + goto do_div; + case BPF_ALU_DIV | BPF_K: + case BPF_ALU_MOD | BPF_K: + if (K == 0) { + set_return(r); + break; + } +do_div: + switch (f->code) { + case BPF_ALU_DIV | BPF_X: + r->A /= r->X; + break; + case BPF_ALU_DIV | BPF_K: + r->A /= K; + break; + case BPF_ALU_MOD | BPF_X: + r->A %= r->X; + break; + case BPF_ALU_MOD | BPF_K: + r->A %= K; + break; + } + break; + case BPF_ALU_AND | BPF_X: + r->A &= r->X; + break; + case BPF_ALU_AND | BPF_K: + r->A &= r->X; + break; + case BPF_ALU_OR | BPF_X: + r->A |= r->X; + break; + case BPF_ALU_OR | BPF_K: + r->A |= K; + break; + case BPF_ALU_XOR | BPF_X: + r->A ^= r->X; + break; + case BPF_ALU_XOR | BPF_K: + r->A ^= K; + break; + } +} + +static bool bpf_pc_has_breakpoint(uint16_t pc) +{ + int i; + + for (i = 0; i < array_size(bpf_breakpoints); i++) { + if (bpf_breakpoints[i] < 0) + continue; + if (bpf_breakpoints[i] == pc) + return true; + } + + return false; +} + +static bool bpf_handle_breakpoint(struct bpf_regs *r, struct sock_filter *f, + uint8_t *pkt, uint32_t pkt_caplen, + uint32_t pkt_len) +{ + rl_printf("-- register dump --\n"); + bpf_dump_curr(r, &f[r->Pc]); + rl_printf("-- packet dump --\n"); + bpf_dump_pkt(pkt, pkt_caplen, pkt_len); + rl_printf("(breakpoint)\n"); + return true; +} + +static int bpf_run_all(struct sock_filter *f, uint16_t bpf_len, uint8_t *pkt, + uint32_t pkt_caplen, uint32_t pkt_len) +{ + bool stop = false; + + while (bpf_curr.Rs == false && stop == false) { + bpf_safe_regs(); + + if (bpf_pc_has_breakpoint(bpf_curr.Pc)) + stop = bpf_handle_breakpoint(&bpf_curr, f, pkt, + pkt_caplen, pkt_len); + + bpf_single_step(&bpf_curr, &f[bpf_curr.Pc], pkt, pkt_caplen, + pkt_len); + bpf_curr.Pc++; + } + + return stop ? -1 : bpf_curr.R; +} + +static int bpf_run_stepping(struct sock_filter *f, uint16_t bpf_len, + uint8_t *pkt, uint32_t pkt_caplen, + uint32_t pkt_len, int next) +{ + bool stop = false; + int i = 1; + + while (bpf_curr.Rs == false && stop == false) { + bpf_safe_regs(); + + if (i++ == next) + stop = bpf_handle_breakpoint(&bpf_curr, f, pkt, + pkt_caplen, pkt_len); + + bpf_single_step(&bpf_curr, &f[bpf_curr.Pc], pkt, pkt_caplen, + pkt_len); + bpf_curr.Pc++; + } + + return stop ? -1 : bpf_curr.R; +} + +static bool pcap_loaded(void) +{ + if (pcap_fd < 0) + rl_printf("no pcap file loaded!\n"); + + return pcap_fd >= 0; +} + +static struct pcap_pkthdr *pcap_curr_pkt(void) +{ + return (void *) pcap_ptr_va_curr; +} + +static bool pcap_next_pkt(void) +{ + struct pcap_pkthdr *hdr = pcap_curr_pkt(); + + if (pcap_ptr_va_curr + sizeof(*hdr) - + pcap_ptr_va_start >= pcap_map_size) + return false; + if (hdr->caplen == 0 || hdr->len == 0 || hdr->caplen > hdr->len) + return false; + if (pcap_ptr_va_curr + sizeof(*hdr) + hdr->caplen - + pcap_ptr_va_start >= pcap_map_size) + return false; + + pcap_ptr_va_curr += (sizeof(*hdr) + hdr->caplen); + return true; +} + +static void pcap_reset_pkt(void) +{ + pcap_ptr_va_curr = pcap_ptr_va_start + sizeof(struct pcap_filehdr); +} + +static int try_load_pcap(const char *file) +{ + struct pcap_filehdr *hdr; + struct stat sb; + int ret; + + pcap_fd = open(file, O_RDONLY); + if (pcap_fd < 0) { + rl_printf("cannot open pcap [%s]!\n", strerror(errno)); + return CMD_ERR; + } + + ret = fstat(pcap_fd, &sb); + if (ret < 0) { + rl_printf("cannot fstat pcap file!\n"); + return CMD_ERR; + } + + if (!S_ISREG(sb.st_mode)) { + rl_printf("not a regular pcap file, duh!\n"); + return CMD_ERR; + } + + pcap_map_size = sb.st_size; + if (pcap_map_size <= sizeof(struct pcap_filehdr)) { + rl_printf("pcap file too small!\n"); + return CMD_ERR; + } + + pcap_ptr_va_start = mmap(NULL, pcap_map_size, PROT_READ, + MAP_SHARED | MAP_LOCKED, pcap_fd, 0); + if (pcap_ptr_va_start == MAP_FAILED) { + rl_printf("mmap of file failed!"); + return CMD_ERR; + } + + hdr = (void *) pcap_ptr_va_start; + if (hdr->magic != TCPDUMP_MAGIC) { + rl_printf("wrong pcap magic!\n"); + return CMD_ERR; + } + + pcap_reset_pkt(); + + return CMD_OK; + +} + +static void try_close_pcap(void) +{ + if (pcap_fd >= 0) { + munmap(pcap_ptr_va_start, pcap_map_size); + close(pcap_fd); + + pcap_ptr_va_start = pcap_ptr_va_curr = NULL; + pcap_map_size = 0; + pcap_packet = 0; + pcap_fd = -1; + } +} + +static int cmd_load_bpf(char *bpf_string) +{ + char sp, *token, separator = ','; + unsigned short bpf_len, i = 0; + struct sock_filter tmp; + + bpf_prog_len = 0; + memset(bpf_image, 0, sizeof(bpf_image)); + + if (sscanf(bpf_string, "%hu%c", &bpf_len, &sp) != 2 || + sp != separator || bpf_len > BPF_MAXINSNS || bpf_len == 0) { + rl_printf("syntax error in head length encoding!\n"); + return CMD_ERR; + } + + token = bpf_string; + while ((token = strchr(token, separator)) && (++token)[0]) { + if (i >= bpf_len) { + rl_printf("program exceeds encoded length!\n"); + return CMD_ERR; + } + + if (sscanf(token, "%hu %hhu %hhu %u,", + &tmp.code, &tmp.jt, &tmp.jf, &tmp.k) != 4) { + rl_printf("syntax error at instruction %d!\n", i); + return CMD_ERR; + } + + bpf_image[i].code = tmp.code; + bpf_image[i].jt = tmp.jt; + bpf_image[i].jf = tmp.jf; + bpf_image[i].k = tmp.k; + + i++; + } + + if (i != bpf_len) { + rl_printf("syntax error exceeding encoded length!\n"); + return CMD_ERR; + } else + bpf_prog_len = bpf_len; + if (!bpf_runnable(bpf_image, bpf_prog_len)) + bpf_prog_len = 0; + + return CMD_OK; +} + +static int cmd_load_pcap(char *file) +{ + char *file_trim, *tmp; + + file_trim = strtok_r(file, " ", &tmp); + if (file_trim == NULL) + return CMD_ERR; + + try_close_pcap(); + + return try_load_pcap(file_trim); +} + +static int cmd_load(char *arg) +{ + char *subcmd, *cont, *tmp = strdup(arg); + int ret = CMD_OK; + + subcmd = strtok_r(tmp, " ", &cont); + if (subcmd == NULL) + goto out; + if (matches(subcmd, "bpf") == 0) { + bpf_reset(); + bpf_reset_breakpoints(); + + ret = cmd_load_bpf(cont); + } else if (matches(subcmd, "pcap") == 0) { + ret = cmd_load_pcap(cont); + } else { +out: + rl_printf("bpf : load bpf code\n"); + rl_printf("pcap : load pcap file\n"); + ret = CMD_ERR; + } + + free(tmp); + return ret; +} + +static int cmd_step(char *num) +{ + struct pcap_pkthdr *hdr; + int steps, ret; + + if (!bpf_prog_loaded() || !pcap_loaded()) + return CMD_ERR; + + steps = strtol(num, NULL, 10); + if (steps == 0 || strlen(num) == 0) + steps = 1; + if (steps < 0) { + if (!bpf_restore_regs(steps)) + return CMD_ERR; + steps = 1; + } + + hdr = pcap_curr_pkt(); + ret = bpf_run_stepping(bpf_image, bpf_prog_len, + (uint8_t *) hdr + sizeof(*hdr), + hdr->caplen, hdr->len, steps); + if (ret >= 0 || bpf_curr.Rs) { + bpf_reset(); + if (!pcap_next_pkt()) { + rl_printf("(going back to first packet)\n"); + pcap_reset_pkt(); + } else { + rl_printf("(next packet)\n"); + } + } + + return CMD_OK; +} + +static int cmd_select(char *num) +{ + unsigned int which, i; + struct pcap_pkthdr *hdr; + bool have_next = true; + + if (!pcap_loaded() || strlen(num) == 0) + return CMD_ERR; + + which = strtoul(num, NULL, 10); + if (which == 0) { + rl_printf("packet count starts with 1, clamping!\n"); + which = 1; + } + + pcap_reset_pkt(); + bpf_reset(); + + for (i = 0; i < which && (have_next = pcap_next_pkt()); i++) + /* noop */; + if (!have_next || (hdr = pcap_curr_pkt()) == NULL) { + rl_printf("no packet #%u available!\n", which); + pcap_reset_pkt(); + return CMD_ERR; + } + + return CMD_OK; +} + +static int cmd_breakpoint(char *subcmd) +{ + if (!bpf_prog_loaded()) + return CMD_ERR; + if (strlen(subcmd) == 0) + bpf_dump_breakpoints(); + else if (matches(subcmd, "reset") == 0) + bpf_reset_breakpoints(); + else { + unsigned int where = strtoul(subcmd, NULL, 10); + + if (where < bpf_prog_len) { + bpf_set_breakpoints(where); + rl_printf("breakpoint at: "); + bpf_disasm(bpf_image[where], where); + } + } + + return CMD_OK; +} + +static int cmd_run(char *num) +{ + static uint32_t pass = 0, fail = 0; + struct pcap_pkthdr *hdr; + bool has_limit = true; + int ret, pkts = 0, i = 0; + + if (!bpf_prog_loaded() || !pcap_loaded()) + return CMD_ERR; + + pkts = strtol(num, NULL, 10); + if (pkts == 0 || strlen(num) == 0) + has_limit = false; + + do { + hdr = pcap_curr_pkt(); + ret = bpf_run_all(bpf_image, bpf_prog_len, + (uint8_t *) hdr + sizeof(*hdr), + hdr->caplen, hdr->len); + if (ret > 0) + pass++; + else if (ret == 0) + fail++; + else + return CMD_OK; + bpf_reset(); + } while (pcap_next_pkt() && (!has_limit || (has_limit && ++i < pkts))); + + rl_printf("bpf passes:%u fails:%u\n", pass, fail); + + pcap_reset_pkt(); + bpf_reset(); + + pass = fail = 0; + return CMD_OK; +} + +static int cmd_disassemble(char *line_string) +{ + bool single_line = false; + unsigned long line; + + if (!bpf_prog_loaded()) + return CMD_ERR; + if (strlen(line_string) > 0 && + (line = strtoul(line_string, NULL, 10)) < bpf_prog_len) + single_line = true; + if (single_line) + bpf_disasm(bpf_image[line], line); + else + bpf_disasm_all(bpf_image, bpf_prog_len); + + return CMD_OK; +} + +static int cmd_dump(char *dontcare) +{ + if (!bpf_prog_loaded()) + return CMD_ERR; + + bpf_dump_all(bpf_image, bpf_prog_len); + + return CMD_OK; +} + +static int cmd_quit(char *dontcare) +{ + return CMD_EX; +} + +static const struct shell_cmd cmds[] = { + CMD("load", cmd_load), + CMD("select", cmd_select), + CMD("step", cmd_step), + CMD("run", cmd_run), + CMD("breakpoint", cmd_breakpoint), + CMD("disassemble", cmd_disassemble), + CMD("dump", cmd_dump), + CMD("quit", cmd_quit), +}; + +static int execf(char *arg) +{ + char *cmd, *cont, *tmp = strdup(arg); + int i, ret = 0, len; + + cmd = strtok_r(tmp, " ", &cont); + if (cmd == NULL) + goto out; + len = strlen(cmd); + for (i = 0; i < array_size(cmds); i++) { + if (len != strlen(cmds[i].name)) + continue; + if (strncmp(cmds[i].name, cmd, len) == 0) { + ret = cmds[i].func(cont); + break; + } + } +out: + free(tmp); + return ret; +} + +static char *shell_comp_gen(const char *buf, int state) +{ + static int list_index, len; + const char *name; + + if (!state) { + list_index = 0; + len = strlen(buf); + } + + for (; list_index < array_size(cmds); ) { + name = cmds[list_index].name; + list_index++; + + if (strncmp(name, buf, len) == 0) + return strdup(name); + } + + return NULL; +} + +static char **shell_completion(const char *buf, int start, int end) +{ + char **matches = NULL; + + if (start == 0) + matches = rl_completion_matches(buf, shell_comp_gen); + + return matches; +} + +static void intr_shell(int sig) +{ + if (rl_end) + rl_kill_line(-1, 0); + + rl_crlf(); + rl_refresh_line(0, 0); + rl_free_line_state(); +} + +static void init_shell(FILE *fin, FILE *fout) +{ + char file[128]; + + memset(file, 0, sizeof(file)); + snprintf(file, sizeof(file) - 1, + "%s/.bpf_dbg_history", getenv("HOME")); + + read_history(file); + + memset(file, 0, sizeof(file)); + snprintf(file, sizeof(file) - 1, + "%s/.bpf_dbg_init", getenv("HOME")); + + rl_instream = fin; + rl_outstream = fout; + + rl_readline_name = "bpf_dbg"; + rl_terminal_name = getenv("TERM"); + + rl_catch_signals = 0; + rl_catch_sigwinch = 1; + + rl_attempted_completion_function = shell_completion; + + rl_bind_key('\t', rl_complete); + + rl_bind_key_in_map('\t', rl_complete, emacs_meta_keymap); + rl_bind_key_in_map('\033', rl_complete, emacs_meta_keymap); + + rl_read_init_file(file); + rl_prep_terminal(0); + rl_set_signals(); + + signal(SIGINT, intr_shell); +} + +static void exit_shell(void) +{ + char file[128]; + + memset(file, 0, sizeof(file)); + snprintf(file, sizeof(file) - 1, + "%s/.bpf_dbg_history", getenv("HOME")); + + write_history(file); + clear_history(); + rl_deprep_terminal(); + + try_close_pcap(); +} + +static int run_shell_loop(FILE *fin, FILE *fout) +{ + char *buf; + int ret; + + init_shell(fin, fout); + + while ((buf = readline("> ")) != NULL) { + ret = execf(buf); + if (ret == CMD_EX) + break; + if (ret == CMD_OK && strlen(buf) > 0) + add_history(buf); + + free(buf); + } + + exit_shell(); + return 0; +} + +int main(int argc, char **argv) +{ + FILE *fin = NULL, *fout = NULL; + + if (argc >= 2) + fin = fopen(argv[1], "r"); + if (argc >= 3) + fout = fopen(argv[2], "w"); + + return run_shell_loop(fin ? : stdin, fout ? : stdout); +} diff --git a/tools/net/bpf_exp.l b/tools/net/bpf_exp.l new file mode 100644 index 000000000000..bf7be77ddd62 --- /dev/null +++ b/tools/net/bpf_exp.l @@ -0,0 +1,143 @@ +/* + * BPF asm code lexer + * + * This program is free software; you can distribute it and/or modify + * it under the terms of the GNU General Public License as published + * by the Free Software Foundation; either version 2 of the License, + * or (at your option) any later version. + * + * Syntax kept close to: + * + * Steven McCanne and Van Jacobson. 1993. The BSD packet filter: a new + * architecture for user-level packet capture. In Proceedings of the + * USENIX Winter 1993 Conference Proceedings on USENIX Winter 1993 + * Conference Proceedings (USENIX'93). USENIX Association, Berkeley, + * CA, USA, 2-2. + * + * Copyright 2013 Daniel Borkmann + * Licensed under the GNU General Public License, version 2.0 (GPLv2) + */ + +%{ + +#include +#include +#include + +#include "bpf_exp.yacc.h" + +extern void yyerror(const char *str); + +%} + +%option align +%option ecs + +%option nounput +%option noreject +%option noinput +%option noyywrap + +%option 8bit +%option caseless +%option yylineno + +%% + +"ldb" { return OP_LDB; } +"ldh" { return OP_LDH; } +"ld" { return OP_LD; } +"ldi" { return OP_LDI; } +"ldx" { return OP_LDX; } +"ldxi" { return OP_LDXI; } +"ldxb" { return OP_LDXB; } +"st" { return OP_ST; } +"stx" { return OP_STX; } +"jmp" { return OP_JMP; } +"ja" { return OP_JMP; } +"jeq" { return OP_JEQ; } +"jneq" { return OP_JNEQ; } +"jne" { return OP_JNEQ; } +"jlt" { return OP_JLT; } +"jle" { return OP_JLE; } +"jgt" { return OP_JGT; } +"jge" { return OP_JGE; } +"jset" { return OP_JSET; } +"add" { return OP_ADD; } +"sub" { return OP_SUB; } +"mul" { return OP_MUL; } +"div" { return OP_DIV; } +"mod" { return OP_MOD; } +"neg" { return OP_NEG; } +"and" { return OP_AND; } +"xor" { return OP_XOR; } +"or" { return OP_OR; } +"lsh" { return OP_LSH; } +"rsh" { return OP_RSH; } +"ret" { return OP_RET; } +"tax" { return OP_TAX; } +"txa" { return OP_TXA; } + +"#"?("len") { return K_PKT_LEN; } +"#"?("proto") { return K_PROTO; } +"#"?("type") { return K_TYPE; } +"#"?("poff") { return K_POFF; } +"#"?("ifidx") { return K_IFIDX; } +"#"?("nla") { return K_NLATTR; } +"#"?("nlan") { return K_NLATTR_NEST; } +"#"?("mark") { return K_MARK; } +"#"?("queue") { return K_QUEUE; } +"#"?("hatype") { return K_HATYPE; } +"#"?("rxhash") { return K_RXHASH; } +"#"?("cpu") { return K_CPU; } +"#"?("vlan_tci") { return K_VLANT; } +"#"?("vlan_pr") { return K_VLANP; } + +":" { return ':'; } +"," { return ','; } +"#" { return '#'; } +"%" { return '%'; } +"[" { return '['; } +"]" { return ']'; } +"(" { return '('; } +")" { return ')'; } +"x" { return 'x'; } +"a" { return 'a'; } +"+" { return '+'; } +"M" { return 'M'; } +"*" { return '*'; } +"&" { return '&'; } + +([0][x][a-fA-F0-9]+) { + yylval.number = strtoul(yytext, NULL, 16); + return number; + } +([0][b][0-1]+) { + yylval.number = strtol(yytext + 2, NULL, 2); + return number; + } +(([0])|([-+]?[1-9][0-9]*)) { + yylval.number = strtol(yytext, NULL, 10); + return number; + } +([0][0-9]+) { + yylval.number = strtol(yytext + 1, NULL, 8); + return number; + } +[a-zA-Z_][a-zA-Z0-9_]+ { + yylval.label = strdup(yytext); + return label; + } + +"/*"([^\*]|\*[^/])*"*/" { /* NOP */ } +";"[^\n]* { /* NOP */ } +^#.* { /* NOP */ } +[ \t]+ { /* NOP */ } +[ \n]+ { /* NOP */ } + +. { + printf("unknown character \'%s\'", yytext); + yyerror("lex unknown character"); + } + +%% diff --git a/tools/net/bpf_exp.y b/tools/net/bpf_exp.y new file mode 100644 index 000000000000..f524110643bb --- /dev/null +++ b/tools/net/bpf_exp.y @@ -0,0 +1,749 @@ +/* + * BPF asm code parser + * + * This program is free software; you can distribute it and/or modify + * it under the terms of the GNU General Public License as published + * by the Free Software Foundation; either version 2 of the License, + * or (at your option) any later version. + * + * Syntax kept close to: + * + * Steven McCanne and Van Jacobson. 1993. The BSD packet filter: a new + * architecture for user-level packet capture. In Proceedings of the + * USENIX Winter 1993 Conference Proceedings on USENIX Winter 1993 + * Conference Proceedings (USENIX'93). USENIX Association, Berkeley, + * CA, USA, 2-2. + * + * Copyright 2013 Daniel Borkmann + * Licensed under the GNU General Public License, version 2.0 (GPLv2) + */ + +%{ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "bpf_exp.yacc.h" + +enum jmp_type { JTL, JFL, JKL }; + +extern FILE *yyin; +extern int yylex(void); +extern void yyerror(const char *str); + +extern void bpf_asm_compile(FILE *fp, bool cstyle); +static void bpf_set_curr_instr(uint16_t op, uint8_t jt, uint8_t jf, uint32_t k); +static void bpf_set_curr_label(const char *label); +static void bpf_set_jmp_label(const char *label, enum jmp_type type); + +%} + +%union { + char *label; + uint32_t number; +} + +%token OP_LDB OP_LDH OP_LD OP_LDX OP_ST OP_STX OP_JMP OP_JEQ OP_JGT OP_JGE +%token OP_JSET OP_ADD OP_SUB OP_MUL OP_DIV OP_AND OP_OR OP_XOR OP_LSH OP_RSH +%token OP_RET OP_TAX OP_TXA OP_LDXB OP_MOD OP_NEG OP_JNEQ OP_JLT OP_JLE OP_LDI +%token OP_LDXI + +%token K_PKT_LEN K_PROTO K_TYPE K_NLATTR K_NLATTR_NEST K_MARK K_QUEUE K_HATYPE +%token K_RXHASH K_CPU K_IFIDX K_VLANT K_VLANP K_POFF + +%token ':' ',' '[' ']' '(' ')' 'x' 'a' '+' 'M' '*' '&' '#' '%' + +%token number label + +%type