Tuesday 17 July 2018

DPDK works

Processing Packets in Linux: Main Stages

When a network card first receives a packet, it sends it to a receive queue, or RX. From there, it gets copied to the main memory via the DMA (Direct Memory Access) mechanism.
Direct memory access (DMA) is a method that allows an input/output (I/O) device to send or receive data directly to or from the main memory, bypassing the CPU to speed up memory operations. The process is managed by a chip known as a DMA controller (DMAC).

DMA channels are used to communicate data between the peripheral device and the system memory.With DMA, the CPU can process other tasks while data transfer is being performed. The transfer of data is first initiated by the CPU. During the transfer of data between the DMA channel and I/O device, the CPU performs other tasks. When the data transfer is complete, the CPU receives an interrupt request from the DMA controller.

Afterwards, the system needs to be notified of the new packet and pass the data onto a specially allocated buffer (Linux allocates these buffers for every packet). To do this, Linux uses an interrupt mechanism: an interrupt is generated several times when a new packet enters the system. The packet then needs to be transferred to the user space.
One bottleneck is already apparent: as more packets have to be processed, more resources are consumed, which negatively affects the overall system performance.
As we’ve already said, these packets are saved to specially allocated buffers—more specifically, the sk_buff struct. This struct is allocated for each packet and becomes free when a packet enters the user space. This operation consumes a lot of bus cycles (i.e. cycles that transfer data from the CPU to the main memory).

DPDK: How It Works

General Features

Let’s look at the following illustration:
pr-3303
The DPDK also requires hugepages be configured. This is required for allocating large chunks of memory and writing data to them. We can say that hugepages does the same job in DPDK that DMA does in traditional packet processing.

 let’s go over the main stages of packet processing with the DPDK:
  1. Incoming packets go to a ring buffer (we’ll look at its setup in the next section). The application periodically checks this buffer for new packets.
  2. If the buffer contains new packet descriptors, the application will refer to the DPDK packet buffers in the specially allocated memory pool using the pointers in the packet descriptors.
  3. If the ring buffer does not contain any packets, the application will queue the network devices under the DPDK and then refer to the ring again.

EAL: Environment Abstraction(rte_eal )

Drivers and libraries for Linux and the BSD system are saved in this directory. It also contains a set of header files for various processor architectures: ARM, x86, TILE64, and PPC64.
All of the applications that use the DPDK (see here for examples) must include the EAL’s header files.
The most commonly of these include:
  • rte_lcore.h — manages processor cores and sockets;
  • rte_memory.h — manages memory;
  • rte_pci.h — provides the interface access to PCI address space;
  • rte_debug.h — provides trace and debug functions (logging, dump_stack, and more);
  • rte_interrupts.h — processes interrupts.

Managing Queues: rte_ring

As we’ve already said, packets received by the network card are sent to a ring buffer,which acts as a receiving queue. Packets received in the DPDK are also sent to a queue implemented on the rte_ring library. 

The queue is a lockless ring buffer built on the FIFO (First In, First Out) principle. The ring buffer is a table of pointers for objects that can be saved to the memory. Pointers can be divided into four categories: prod_tail, prod_head, cons_tail, cons_head.

Prods is short for producer, and cons for consumer. The producer is the process that writes data to the buffer at a given time, and the consumer is the process that removes data from the buffer.

The tail is where writing takes place on the ring buffer. The place the buffer is read from at a given time is called the head.

The drawback to DPDK’s ring buffer is its fixed size, which cannot be increased on the fly. Additionally, much more memory is spent working with the ring structure than in a linked queue since the buffer always uses the the maximum number of pointers.

Memory Management: rte_mempool

We mentioned above that DPDK requires hugepages. The installation instructions recommend creating 2MB hugepages.

These pages are combined in segments, which are then divided into zones. Objects that are created by applications or other libraries, like queues and packet buffers, are placed in these zones.
These objects include memory pools, which are created by the rte_mempool library. These are fixed size object pools that use rte_ring for storing free objects and can be identified by a unique name.
Memory alignment techniques can be implemented to improve performance.
Even though access to free objects is designed on a lockless ring buffer, consumption of system resources may still be very high. As multiple cores have access to the ring, a compare-and-set (CAS) operation usually has to be performed each time it is accessed.
To prevent bottlenecking, every core is given an additional local cache in the memory pool. Using the locking mechanism, cores can fully access the free object cache. When the cache is full or entirely empty, the memory pool exchanges data with the ring buffer. This gives the core access to frequently used objects.
Buffer Management: rte_mbuf
In the Linux network stack, all network packets are represented by the the sk_buff data structure. In DPDK, this is done using the rte_mbuf struct, which is described in the rte_mbuf.h header file.
The buffer management approach in DPDK is reminiscent of the approach used in FreeBSD: instead of one big sk_buff struct, there are many smaller rte_mbuf buffers. The buffers are created before the DPDK application is launched and are saved in memory pools (memory is allocated by rte_mempool).
In addition to its own packet data, each buffer contains metadata (message type, length, data segment starting address).The buffer also contains pointers for the next buffer. This is needed when handling packets with large amounts of data. In cases like these, packets can be combined.

Other Libraries: General Overview

In previous sections, we talked about the most basic DPDK libraries. There’s a great deal of other libraries, but one article isn’t enough to describe them all. Thus, we’ll be limiting ourselves to just a brief overview.
With the LPM library, DPDK runs the Longest Prefix Match (LPM) algorithm, which can be used to forward packets based on their IPv4 address. The primary function of this library is to add and delete IP addresses as well as to search for new addresses using the LPM algorithm.
Other libraries offer similar functionality based on hash functions. With rte_hash, you can search through a large record set using a unique key. This library can be used for classifying and distributing packets, for example.
The rte_timer library lets you execute functions asynchronously







No comments:

Post a Comment