The two most commonly used datapath implementations are
netlink. They are named after the mechanism used for integrating with the operating system. The
netlink datapath is Linux-specific and uses a special kernel module. The
netdev datapath is implemented in userspace.
Datapaths are optimized for high-performance packet processing and they completely handle most of the packets that come through them. Consequently, the forwarding tables in the datapaths are not equivalent to the OpenFlow forwarding tables and a translation mechanism between the two forwarding table formats is present. The translation is performed lazily. When a packet does not match any rule in the datapath forwarding table, an upcall is made and the packet is passed out of the datapath into the upper layers of
ovs-vswitchd performs a full lookup through the OpenFlow forwarding tables and generates a new datapath flow rule which gets inserted into the datapath together with the packet itself.
We use the name fast path to refer to datapath-only packet processing. If the packet misses all datapath flow rules and generates an upcall, we say that it took the slow path.
netdev datapath is a multiplatform datapath implementation entirely in user space. All major operating systems are currently supported.
The netdev datapath supports various configurations according to the system it is running on. The most basic deployment is rather slow as all packets have to cross the userspace boundary twice. However, the datapath can be configured with several types of accelerators significantly lowering the packet processing cost. The datapath can be accelerated using DPDK, AF_XDP sockets or offloaded to hardware using TC.
This article does not focus on the
netdev datapath any further.
netlink datapath is Linux-specific. It resides partially in a kernel module and partially in user space. The kernel module does most of the packet processing. The userspace communicates with the kernel over a netlink socket, hence the name. Only the slow path passes data from the kernel space to the user space.
From now on, we will focus on the
netlink datapath as it is the most commonly deployed datapath in Kubernetes clusters.
Interactions between the fast path and the slow path
Packets always start their journey in the kernel. They enter the
openvswitch module where they are checked against a set of flow rules stored in the flow table (OVS's forwarding table). If a flow rule matches the packet, its actions are executed. Otherwise, the packet is sent to the user space.
def process_packet_in_kernel(packet): for flow_rule in flow_table: if flow_rule criteria match the packet: execute the flow_rule actions return make an upcall to the user space
The slow path gets involved when the fast path fails (no flow rule match) or when the action in the flow rule explicitly asks for it. The packet is sent to the user space via a netlink socket in an upcall.
ovs-vswitchd receives the packet, processes it according to the OpenFlow rules and reinjects it back into the kernel. Additionally, a new flow rule might be generated and installed into the kernel to speed up the future processing of similar packets.
The packets in the fast path are not buffered and they are processed immediately. The slow-path buffers packets when they are passed from kernel to user space, introducing additional latency.
The data structures
The flow table (
struct flow_table) is an in-kernel data structure, which stores flows rules (
struct sw_flow) and allows for fast matching with individual packets. Similar to OpenFlow flow rules, every datapath flow contains a list of actions (
struct sw_flow_actions), statistics (
struct sw_flow_stats), a flow key (
struct sw_flow_key) and a flow mask (
The flow key and the flow mask are the matching criteria. The combination of both of these enables fast matching with the possibility of wildcards. The key is a complex structure with parsed-out packet header values. It can be created from any packet with the function
key_extract(). The mask specifies which bits are significant when comparing two keys.
def equals(key1, key2, mask) -> bool: return (key1 & mask) == (key2 & mask)
The matching algorithm
When the kernel processes a packet, it creates the packet's corresponding flow key and looks for matching flows in the flow table. The packet can match any number of flows, but only the first matching rule is always used. The userspace component is responsible for preventing overlapping conflicting flow rules.
The flow table stores the flows in a hash table. The lookup key is the masked
sw_flow_key. Therefore, to find a flow for a packet, the kernel has to try several masks. The lookup procedure could look like this:
def lookup(flow_table, key): for mask in flow_table.mask_array: masked_key = apply mask to key if masked_key in flow_table.flows: return flow_table.flows[masked_key] return None
The real implementation (
ovs_flow_tbl_lookup_stats) is in principle similar, but more optimized:
- The kernel keeps mask usage statistics and the
mask_arrayis kept sorted with the most used masks first (
ovs_flow_masks_rebalance()). This happens periodically based on a time interval.
- There is already a
sk_buff(a kernel structure wrapping all packets) hash based on source/destination addresses and ports. The lookup procedure makes use of the hash by having a fixed size (256) hash table storing references to their masks. The cached masks are then tried first. If there is no match, a standard lookup over all masks follows and the cache entry is replaced with a new flow. This helps with burst performance.
The actions in a flow rule are similar to actions in OpenFlow. They are described in the manpages
Special attention should be given to the recirculation action, which corresponds to the
resubmit OpenFlow action. This action resubmits the packet to the datapath again, updating its flow key's
recirc_id to a new value. This effectively simulates having multiple flow tables in the datapath with only a single physical table.
The user space process (
ovs-vswitchd) communicates with the kernel over a netlink socket. When a packet leaves the
fast-path, it is temporarily buffered in a queue (
ovs_dp_upcall()) when crossing the kernel boundary.
ovs-vswitchd reads packets from the kernel in several handler threads. The datapath interface defines a
recv() function for receiving a single packet from the kernel. The netlink datapath implements it with the
Higher up, the
recv() datapath interface function is used in a generic
dpif_recv() function which also provides an useful tracepoint for measurements. Even higher up the abstraction stack, the
recv_upcalls() function in the file
ofproto-dpif-upcall.c reads packets in batches, which are then processed by
handle_upcalls() function essentially transforms the list of packets into a list of operations that should be executed on the datapath. This includes adding new flows to the datapath as well as simply sending packets where they belong.
Parallel with upcall processing in the handler threads, OVS also runs several maintenance tasks:
- A balancing task makes sure that when the system is under stress, the most frequently used flow rules are in the kernel.
- The revalidator threads periodically dump statistics from the kernel and remove old unused flows. The number of revalidator threads scales with the number of available cores on the system.