Hopelessly out of date /dev/null | 345 -------------------------------------------------------------- 1 files changed, 345 deletions(-) diff -puN -L Documentation/smp.tex Documentation/smp.tex~remove-smp-txt /dev/null --- 25/Documentation/smp.tex +++ /dev/null Thu Apr 11 07:25:15 2002 @@ -1,346 +0,0 @@ -From: michael@Physik.Uni-Dortmund.DE (Michael Dirkmann) - -thanks for your information. Attached is the tex-code of your -SMP-documentation : --=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= -\documentclass[]{article} -\parindent0.0cm -\parskip0.2cm - -\begin{document} - -\begin{center} -\LARGE \bf -An Implementation Of Multiprocessor Linux -\normalsize -\end{center} - -{ \it -This document describes the implementation of a simple SMP -Linux kernel extension and how to use this to develop SMP Linux kernels for -architectures other than the Intel MP v1.1 architecture for Pentium and 486 -processors.} - -\hfill Alan Cox, 1995 - - -The author wishes to thank Caldera Inc. ( http://www.caldera.com ) -whose donation of an ASUS dual Pentium board made this project possible, -and Thomas Radke, whose initial work on multiprocessor Linux formed -the backbone of this project. - -\section{Background: The Intel MP specification.} -Most IBM PC style multiprocessor motherboards combine Intel 486 or Pentium -processors and glue chipsets with a hardware/software specification. The -specification places much of the onus for hard work on the chipset and -hardware rather than the operating system. - -The Intel Pentium processors have a wide variety of inbuilt facilities for -supporting multiprocessing, including hardware cache coherency, built in -interprocessor interrupt handling and a set of atomic test and set, -exchange and similar operations. The cache coherency in particular makes the -operating system's job far easier. - -The specification defines a detailed configuration structure in ROM that -the boot up processor can read to find the full configuration of the -processors and buses. It also defines a procedure for starting up the -other processors. - - -\section{Mutual Exclusion Within A Single Processor Linux Kernel} -For any kernel to function in a sane manner it has to provide internal -locking and protection of its own tables to prevent two processes updating -them at once and for example allocating the same memory block. There are -two strategies for this within current Unix and Unixlike kernels. -Traditional Unix systems from the earliest of days use a scheme of 'Coarse -Grained Locking' where the entire kernel is protected by a small number of -locks only. Some modern systems use fine grained locking. Because fine -grained locking has more overhead it is normally used only on -multiprocessor kernels and real time kernels. In a real time kernel the -fine grained locking reduces the amount of time locks are held and reduces -the critical (to real time programming at least) latency times. - -Within the Linux kernel certain guarantees are made. No process running in -kernel mode will be pre-empted by another kernel mode process unless it -voluntarily sleeps. This ensures that blocks of kernel code are -effectively atomic with respect to other processes and greatly simplifies -many operations. Secondly interrupts may pre-empt a kernel running process, -but will always return to that process. A process in kernel mode may -disable interrupts on the processor and guarantee such an interruption will -not occur. The final guarantee is that an interrupt will not be pre-empted -by a kernel task. That is interrupts will run to completion or be -pre-empted by other interrupts only. - -The SMP kernel chooses to continue these basic guarantees in order to make -initial implementation and deployment easier. A single lock is maintained -across all processors. This lock is required to access the kernel space. -Any processor may hold it and once it is held may also re-enter the kernel -for interrupts and other services whenever it likes until the lock is -relinquished. This lock ensures that a kernel mode process will not be -pre-empted and ensures that blocking interrupts in kernel mode behaves -correctly. This is guaranteed because only the processor holding the lock -can be in kernel mode, only kernel mode processes can disable interrupts -and only the processor holding the lock may handle an interrupt. - -Such a choice is however poor for performance. In the longer term it is -necessary to move to finer grained parallelism in order to get the best -system performance. This can be done hierarchically by gradually refining -the locks to cover smaller areas. With the current kernel highly CPU bound -process sets perform well but I/O bound task sets can easily degenerate to -near single processor performance levels. This refinement will be needed to -get the best from Linux/SMP. - -\subsection{Changes To The Portable Kernel Components} -The kernel changes are split into generic SMP support changes and -architecture specific changes necessary to accommodate each different -processor type Linux is ported to. - - -\subsubsection{Initialisation} -The first problem with a multiprocessor kernel is starting the other -processors up. Linux/SMP defines that a single processor enters the normal -kernel entry point start\_kernel(). Other processors are assumed not to be -started or to have been captured elsewhere. The first processor begins the -normal Linux initialisation sequences and sets up paging, interrupts and -trap handlers. After it has obtained the processor information about the -boot CPU, the architecture specific function - - -{\tt \bf{void smp\_store\_cpu\_info(int processor\_id) }} - -is called to store any information about the processor into a per processor -array. This includes things like the bogomips speed ratings. - -Having completed the kernel initialisation the architecture specific -function - -{\tt \bf void smp\_boot\_cpus(void) } - -is called and is expected to start up each other processor and cause it to -enter start\_kernel() with its paging registers and other control -information correctly loaded. Each other processor skips the setup except -for calling the trap and irq initialisation functions that are needed on -some processors to set each CPU up correctly. These functions will -probably need to be modified in existing kernels to cope with this. - - -Each additional CPU then calls the architecture specific function - -{\tt \bf void smp\_callin(void)} - -which does any final setup and then spins the processor while the boot -up processor forks off enough idle threads for each processor. This is -necessary because the scheduler assumes there is always something to run. -Having generated these threads and forked init the architecture specific - -{\tt \bf void smp\_commence(void)} - -function is invoked. This does any final setup and indicates to the system -that multiprocessor mode is now active. All the processors spinning in the -smp\_callin() function are now released to run the idle processes, which -they will run when they have no real work to process. - - -\subsubsection{Scheduling} -The kernel scheduler implements a simple but very effective task -scheduler. The basic structure of this scheduler is unchanged in the -multiprocessor kernel. A processor field is added to each task, and this -maintains the number of the processor executing a given task, or a magic -constant (NO\_PROC\_ID) indicating the job is not allocated to a processor. - -Each processor executes the scheduler itself and will select the next task -to run from all runnable processes not allocated to a different processor. -The algorithm used by the selection is otherwise unchanged. This is -actually inadequate for the final system because there are advantages to -keeping a process on the same CPU, especially on processor boards with per -processor second level caches. - -Throughout the kernel the variable 'current' is used as a global for the -current process. In Linux/SMP this becomes a macro which expands to -current\_set[smp\_processor\_id()]. This enables almost the entire kernel to -be unaware of the array of running processors, but still allows the SMP -aware kernel modules to see all of the running processes. - -The fork system call is modified to generate multiple processes with a -process id of zero until the SMP kernel starts up properly. This is -necessary because process number 1 must be init, and it is desirable that -all the system threads are process 0. - -The final area within the scheduling of processes that does cause problems -is the fact the uniprocessor kernel hard codes tests for the idle threads -as task[0] and the init process as task[1]. Because there are multiple idle -threads it is necessary to replace these with tests that the process id is -0 and a search for process ID 1, respectively. - -\subsubsection{Memory Management} -The memory management core of the existing Linux system functions -adequately within the multiprocessor framework providing the locking is -used. Certain processor specific areas do need changing, in particular -invalidate() must invalidate the TLBs of all processors before it returns. - - -\subsubsection{Miscellaneous Functions} -The portable SMP code rests on a small set of functions and variables -that are provided by the processor specification functionality. These are - -{\tt \bf int smp\_processor\_id(void) } - -which returns the identity of the processor the call is executed upon. This -call is assumed to be valid at all times. This may mean additional tests -are needed during initialisation. - - -{\tt \bf int smp\_num\_cpus;} - -This is the number of processors in the system. \ - -{\tt \bf void smp\_message\_pass(int target, int msg, unsigned long data, - int wait)} - -This function passes messages between processors. At the moment it is not -sufficiently defined to sensibly document and needs cleaning up and further -work. Refer to the processor specific code documentation for more details. - - -\subsection{Architecture Specific Code For the Intel MP Port} -The architecture specific code for the Intel port splits fairly cleanly -into four sections. Firstly the initialisation code used to boot the -system, secondly the message handling and support code, thirdly the -interrupt and kernel syscall entry function handling and finally the -extensions to standard kernel facilities to cope with multiple processors. - -\subsubsection{Initialisation} -The Intel MP architecture captures all the processors except for a single -processor known as the 'boot processor' in the BIOS at boot time. Thus a -single processor enters the kernel bootup code. The first processor -executes the bootstrap code, loads and uncompresses the kernel. Having -unpacked the kernel it sets up the paging and control registers then enters -the C kernel startup. - -The assembler startup code for the kernel is modified so that it can be -used by the other processors to do the processor identification and various -other low level configurations but does not execute those parts of the -startup code that would damage the running system (such as clearing the BSS -segment). - -In the initialisation done by the first processor the arch/i386/mm/init -code is modified to scan the low page, top page and BIOS for intel MP -signature blocks. This is necessary because the MP signature blocks must -be read and processed before the kernel is allowed to allocate and destroy -the page at the top of low memory. Having established the number of -processors it reserves a set of pages to provide a stack come boot up area -for each processor in the system. These must be allocated at startup to -ensure they fall below the 1Mb boundary. - -Further processors are started up in smp\_boot\_cpus() by programming the -APIC controller registers and sending an inter-processor interrupt (IPI) to -the processor. This message causes the target processor to begin executing -code at the start of any page of memory within the lowest 1Mb, in 16bit -real mode. The kernel uses the single page it allocated for each processor -to use as stack. Before booting a given CPU the relocatable code from -trampoline.S and trampoline32.S is copied to the bottom of its stack page -and used as the target for the startup. - -The trampoline code calculates the desired stack base from the code -segment (since the code segment on startup is the bottom of the stack), - enters 32bit mode and jumps to the kernel entry assembler. This as -described above is modified to only execute the parts necessary for each -processor, and then to enter start\_kernel(). On entering the kernel the -processor initialises its trap and interrupt handlers before entering -smp\_callin(), where it reports its status and sets a flag that causes the -boot processor to continue and look for further processors. The processor -then spins until smp\_commence() is invoked. - -Having started each processor up the smp\_commence( ) function flips a -flag. Each processor spinning in smp\_callin() then loads the task register -with the task state segment (TSS) of its idle thread as is needed for task -switching. - -\subsubsection{Message Handling and Support Code} -The architecture specific code implements the smp\_processor\_id() function -by querying the APIC logical identity register. Because the APIC isn't -mapped into the kernel address space at boot, the initial value returned is -rigged by setting the APIC base pointer to point at a suitable constant. -Once the system starts doing the SMP setup (in smp\_boot\_cpus()), the APIC -is mapped with a vremap() call and the apic pointer is adjusted -appropriately. From then on the real APIC logical identity register is -read. - -Message passing is accomplished using a pair of IPIs on interrupt 13 -(unused by the 80486 FPUs in SMP mode) and interrupt 16. Two are used in -order to separate messages that cannot be processed until the receiver -obtains the kernel spinlock from messages that can be processed -immediately. In effect IRQ 13 is a fast IRQ handler that does not obtain -the locks, and cannot cause a reschedule, while IRQ 16 is a slow IRQ that -must acquire the kernel spinlocks and can cause a reschedule. This -interrupt is used for passing on slave timer messages from the processor -that receives the timer interrupt to the rest of the processors, so that -they can reschedule running tasks. - - -\subsubsection{Entry And Exit Code} -A single spinlock protects the entire kernel. The interrupt handlers, the -syscall entry code and the exception handlers all acquire the lock before -entering the kernel proper. When the processor is trying to acquire the -spinlock it spins continually on the lock with interrupts disabled. This -causes a specific deadlock problem. The lock owner may need to send an -invalidate request to the rest of the processors and wait for these to -complete before continuing. A processor spinning on the lock would not be -able to do this. Thus the loop of the spinlock tests and handles invalidate -requests. If the invalidate bit for the spinning CPU is set the processor -invalidates its TLB and atomically clears the bit. When the spinlock is -obtained that processor will take an IPI and in the IPI test the bit and -skip the invalidate as the bit is clear. - -One complexity of the spinlock is that a process running in kernel mode -can sleep voluntarily and be pre-empted. A switch from such a process to a -process executing in user space may reduce the lock count. To track this -the kernel uses a syscall\_count and a per process lock\_depth parameter to -track the kernel lock state. The switch\_to() function is modified in SMP -mode to adjust the lock appropriately. - -The final problem is the idle thread. In the single processor kernel the -idle thread executes 'hlt' instructions. This saves power and reduces the -running temperature of the processors when they are idle. However it means -the process spends all its time in kernel mode and would thus hold the -kernel spinlock. The SMP idle thread continually reschedules a new task and -returns to user mode. This is far from ideal and will be modified to use -'hlt' instructions and release the spinlock soon. Using 'hlt' is even more -beneficial on a multiprocessor system as it almost completely takes an idle -processor off the bus. - -Interrupts are distributed by an i82489 APIC. This chip is set up to work -as an emulation of the traditional PC interrupt controllers when the -machine boots (so that an Intel MP machine boots one CPU and PC -compatible). The kernel has all the relevant locks but does not yet -reprogram the 82489 to deliver interrupts to arbitrary processors as it -should. This requires further modification of the standard Linux interrupt -handling code, and is particularly messy as the interrupt handler behaviour -has to change as soon as the 82489 is switched into SMP mode. - - -\subsubsection{Extensions To Standard Facilities} -The kernel maintains a set of per processor control information such as -the speed of the processor for delay loops. These functions on the SMP -kernel look the values up in a per processor array that is set up from the -data generated at boot up by the smp\_store\_cpu\_info() function. This -includes other facts such as whether there is an FPU on the processor. The -current kernel does not handle floating point correctly, this requires some -changes to the techniques the single CPU kernel uses to minimise floating -point processor reloads. - -The highly useful atomic bit operations are prefixed with the 'lock' -prefix in the SMP kernel to maintain their atomic properties when used -outside of (and by) the spinlock and message code. Amongst other things -this is needed for the invalidate handler, as all CPU's will invalidate at -the same time without any locks. - -Interrupt 13 floating point error reporting is removed. This facility is -not usable on a multiprocessor board, nor relevant to the Intel MP -architecture which does not cover the 80386/80387 processor pair. \ - -The /proc filesystem support is changed so that the /proc/cpuinfo file -contains a column for each processor present. This information is extracted -from the data saved by smp\_store\_cpu\_info(). - -\end{document} _