We give a brief overview of the linux socket filtering
framework. Starting with a breif overview of raw sockets we look at
the attaching filters to sockets and give a breif overview of the
berekely packet filter machine implemented in the kernel.
Introduction.
Linux socket filter is a register based packet filtering machine. It
has two components a network tap and a packet filter. The tap handles
the copying and delivery, the filter handles decision to copy. The
user has the ability to specify how many bytes of the packet he would
like to copied for his use. Only bytes needed for filtering are ever
referenced.
Raw Sockets
The user can attach filters to sockets via a call to the setsockopt
system call specifying either the SO_ATTACH_FILTER or SO_ATTACH_BPF
to the socket. Now in our case assuming we created our socket as a
SOCK_RAW socket
The entry for SOCK_RAW in the inetsw_array looks like
Going back the function of ip_local_deliver_finish Where the ip
header is checked to find out the protocol to dispatch incoming packet
to.
We see the function raw_local_deliver is used to made the
determination if the packet is meant for delivery to a raw socket.
This will determine if there is a raw socket that is waiting on the
protocol. If there is then we must deliver the packet to this
socket. The __raw_v4_lookup function is used to look up raw sockets
that would be interested in this skb. Attributes of the look up may
include protocol, source address, destination address, device
interface index. We call skb_clone function to clone the skb and
call the raw_rcv on the socket with the cloned packet. raw_rcv
will reset the packet headers back to pointing to this IP
layer. Finally adding the packet to the socket’s receive queue. The
raw_rcv will eventually call sock_queue_rcv_skb. Which will check
if the packet ought to be filtered via a call to the sk_filter
function.
If we had attached a socket filter the raw socket we are going to run
this filter at this point.
The packet filter details
The call to sk_filter gets passed in a socket and an skb. Inside
the socket if we have attached a socket filter this socket filter
struct sk_filter *filter is going to be found. Resulting in a call
to SK_RUN_FILTER. The return type of which is amount of packet
information the user program is interested in.
During the creation of the packet filter the program that the user
passed in will be copied into the socket filter and jit compiled via
bpf_jit_compile or in case no jit is available the program is
translated into an optimized interpreter via bpf_migrate_filter.
Understanding the instruction set
The BPF engine contains a low level asm-like filter
language. Consisting of the following basic elements
An instruction would look like
op is a 16 bit opcode for an instruction, jt and jf are two 8-bit
jump targets where jt is jump target if true and jf is jump
target if false k is an instruction dependent argument.
Following table lists the instructions based on the above machine
model. With A as accumulator, X as register and M as scratch memory.
Along with the above instructions we have the following addressing
modes for addressing locations in packets, in scratch memory and
registers.
Some linux extensions to BPF which allow more convenient access to
frequently needed data.
To get a feel for some of these instructions and their usages consider
a sample BPF programs.
We can now save this into a file say tcp.f , and run the accompanying
assembler. We can generate code that can be directly loaded by the
bpf_dbg where commands get converted into their op codes.
Often one needs to get bpf code in syntax used to specify filters in C
code. We can get this by running
This code can be used in one’s c code to while attaching to sockets like so
Since tools like tcpdump use the libpcap library to compile user
specified filter commands to bpf they can be helpful aids to quickly
generating bpf code. This is where tcpdump options -d , -dd and
-ddd are helpful. As shown bellow. We can use -d to see the
mnemonic code which the tcpdump will generate for an expression. -dd
to generate C-code for an expression and finally -ddd will generate
them as decimal numbers loadable directly into bpf_dbg and other
tools.
One additional tool that my be helpful to mention in the context of
the bpf. Is the bpf_dbg tool(in tools/net directory of kernel
source) which can be used to debug bpf filters over pcap files.This a
bpf debugger allowing us to run bpf , step through the code and more.
Where the load can be used in to load in captured packets and bpf
code which can then be stepped through and debugged.
Other helpful commands available in the debugger are
Complete Example
Finally to wrap up we present a complete example program that uses raw
sockets to listen to http requests and responses and prints it out to
standard out. This is just a modification of the code presented in the
binary tides article for illustration purposes.
Summary
This article scratches the surface of understanding the flexibility
offered in the linux networking stack. Since this more as a brain
dump of reading through the source in the 3.19-rc7 kernel networking
subsystem,thus it may be highly unreliable and
inaccurate. Users beware! With that fair warning lets try to
begin.Any errors in the article are entirely my fault.