Wednesday, 24 August 2016

Linux Kernel With Processor

Multi-Processing:
A network processor contains not one, but many individual processors, which range in complexity from quite limited to C++ programmable RISC processors. They use different strategies to divide up processing and they have different internal data flows.
I will use the term processing element (PE) to refer to the individual processing units of a network processor.

Coprocessors:
In addition to multiple PEs, network processors contain narrow focus coprocessors for tasks such as pattern matching, table lookup, and checksum generation.

The Network Processing Forum is developing an API, called the CPIX API, between the control plane and the data plane, which is to be supported by the majority of network processors.

NUMA?

Older computers had relatively few CPUs per system, which allowed an architecture known as Symmetric Multi-Processor (SMP). This meant that each CPU in the system had similar (or symmetric) access to available memory. In recent years, CPU count-per-socket has grown to the point that trying to give symmetric access to all RAM in the system has become very expensive. Most high CPU count systems these days have an architecture known as Non-Uniform Memory Access (NUMA) instead of SMP.

SMP is fine for a small number of CPUs, but once the CPU count gets above a certain point (8 or 16), the number of parallel traces required to allow equal access to memory uses too much of the available board real estate, leaving less room for peripherals.


Non-Uniform Memory Access (NUMA) strategy, where each package/socket combination has one or more dedicated memory area for high speed access. Each socket also has an interconnect to other sockets for slower access to the other sockets' memory.

As a simple NUMA example, suppose we have a two-socket motherboard, where each socket has been populated with a quad-core package. This means the total number of CPUs in the system is eight; four in each socket. Each socket also has an attached memory bank with four gigabytes of RAM, for a total system memory of eight gigabytes. For the purposes of this example, CPUs 0-3 are in socket 0, and CPUs 4-7 are in socket 1. Each socket in this example also corresponds to a NUMA node.





Q: Task set?


taskset retrieves and sets the CPU affinity of a running process (by process ID).
CPU affinity is represented as a bitmask. The lowest-order bit corresponds to the first logical CPU, and the highest-order bit corresponds to the last logical CPU. These masks are typically given in hexadecimal, so that 0x00000001 represents processor 0, and 0x00000003 represents processors 0 and 1

Q: Huge TLB?

The Huge Translation Lookaside Buffer (HugeTLB) allows memory to be managed in very large segments so that more address mappings can be cached at one time. This reduces the probability of TLB misses, which in turn improves performance in applications with large memory requirements.


huge pages are blocks of memory that come in 2MB and 1GB sizes. The page tables used by the 2MB pages are suitable for managing multiple gigabytes of memory, whereas the page tables of 1GB pages are best for scaling to terabytes of memory.
Huge pages must be assigned at boot time.

Q: How atomic variables work

This is actually quiet simple. Intel x86 and x86_64 processor architectures (as well as vast majority of other modern CPU architectures) has instructions that allow one to lock FSB, while doing some memory access. FSB stands for Front Serial Bus. This is the bus that processor use to communicate with RAM. I.e. locking FSB will prevent from any other processor (core), and process running on that processor, from accessing RAM. And this is exactly what we need to implement atomic variables.


Q: Spin lock


Spin locks are a special kind of lock designed to work in a multiprocessor environment.
If the kernel control path finds the spin lock "open," it acquires the lock and continues
its execution.
Conversely, if the kernel control path finds the lock "closed" by a kernel control path
running on another CPU, it "spins" around, repeatedly executing a tight instruction
loop, until the lock is released.

Q: NetLink Socket


Netlink socket is a flexible interface for communication between user-space applications and kernel modules. It provides an easy-to-use socket API to both applications and the kernel. It provides advanced communication features, such as full-duplex, buffered I/O, multicast and asynchronous communication, which are absent in other kernel/user-space IPCs.


No comments:

Post a Comment