Multi-Processing:
A
network processor contains not one, but many individual processors,
which range in complexity from quite limited to C++ programmable RISC
processors. They use different strategies to divide up processing and
they have different internal data flows.
I
will use the term processing element (PE) to refer to the individual
processing units of a network processor.
Coprocessors:
In addition to multiple PEs, network processors contain narrow focus coprocessors for tasks such as pattern matching, table lookup, and checksum generation.
In addition to multiple PEs, network processors contain narrow focus coprocessors for tasks such as pattern matching, table lookup, and checksum generation.
The
Network Processing Forum is developing an API, called the CPIX API,
between the control plane and the data plane, which is to be
supported by the majority of network processors.
NUMA?
Older computers had relatively few CPUs per system, which allowed an architecture known as Symmetric Multi-Processor (SMP). This meant that each CPU in the system had similar (or symmetric) access to available memory. In recent years, CPU count-per-socket has grown to the point that trying to give symmetric access to all RAM in the system has become very expensive. Most high CPU count systems these days have an architecture known as Non-Uniform Memory Access (NUMA) instead of SMP.
SMP is fine for a small number of CPUs, but once the CPU count gets
above a certain point (8 or 16), the number of parallel traces required to
allow equal access to memory uses too much of the available board real estate,
leaving less room for peripherals.
Non-Uniform
Memory Access (NUMA) strategy, where each package/socket combination has one or
more dedicated memory area for high speed access. Each socket also has an
interconnect to other sockets for slower access to the other sockets' memory.
As a
simple NUMA example, suppose we have a two-socket motherboard, where each
socket has been populated with a quad-core package. This means the total number
of CPUs in the system is eight; four in each socket. Each socket also has an
attached memory bank with four gigabytes of RAM, for a total system memory of
eight gigabytes. For the purposes of this example, CPUs 0-3 are in socket 0,
and CPUs 4-7 are in socket 1. Each socket in this example also corresponds to a
NUMA node.
Q: Task set?
taskset retrieves and sets the CPU affinity of a running process (by
process ID).
CPU
affinity is represented as a bitmask. The lowest-order bit corresponds to the
first logical CPU, and the highest-order bit corresponds to the last logical
CPU. These masks are typically given in hexadecimal, so that 0x00000001 represents processor 0, and 0x00000003
represents processors 0 and 1
Q: Huge TLB?
The Huge Translation Lookaside Buffer (HugeTLB) allows memory to be
managed in very large segments so that more address mappings can be cached at
one time. This reduces the probability of TLB misses, which in turn improves
performance in applications with large memory requirements.
huge pages
are blocks of memory that come in 2MB and 1GB sizes. The page tables used by
the 2MB pages are suitable for managing multiple gigabytes of memory, whereas
the page tables of 1GB pages are best for scaling to terabytes of memory.
Huge
pages must be assigned at boot time.
This is actually quiet simple. Intel x86 and x86_64 processor architectures (as well as vast majority of other modern CPU architectures) has instructions that allow one to lock FSB, while doing some memory access. FSB stands for Front Serial Bus. This is the bus that processor use to communicate with RAM. I.e. locking FSB will prevent from any other processor (core), and process running on that processor, from accessing RAM. And this is exactly what we need to implement atomic variables.
Q: How atomic variables work
This is actually quiet simple. Intel x86 and x86_64 processor architectures (as well as vast majority of other modern CPU architectures) has instructions that allow one to lock FSB, while doing some memory access. FSB stands for Front Serial Bus. This is the bus that processor use to communicate with RAM. I.e. locking FSB will prevent from any other processor (core), and process running on that processor, from accessing RAM. And this is exactly what we need to implement atomic variables.
Q: Spin lock
Spin locks
are a special kind of lock designed to work in a multiprocessor environment.
If the
kernel control path finds the spin lock "open," it acquires the lock
and continuesits execution.
Conversely, if the kernel control path finds the lock "closed" by a kernel control path
running on another CPU, it "spins" around, repeatedly executing a tight instruction
loop, until the lock is released.
Q: NetLink Socket
Netlink socket is a flexible interface for communication
between user-space applications and kernel modules. It provides an easy-to-use
socket API to both applications and the kernel. It provides advanced
communication features, such as full-duplex, buffered I/O, multicast and
asynchronous communication, which are absent in other kernel/user-space IPCs.
No comments:
Post a Comment