INTERRUPT

  • The male number: Rand_cs

Interrupts are a mechanism by which hardware and software interact, and the entire operating system, the entire architecture, is driven by interrupts. The interrupt mechanism is divided into two types, interrupt and exception. Interrupt is usually an asynchronous event triggered by IOIOIO device, and exception is a synchronous event occurred when CPUCPUCPU executes instructions. This paper mainly explains the interrupt triggered by IOIOIO peripherals. In general, the beginning and end of an interrupt will go through three stages: the device generates the interrupt signal, the interrupt controller translates the signal, and the CPU actually processes the signal.

This article uses the xv6XV6xv6 example to explain the interrupt mechanism under the multiprocessor, from beginning to end to see, interrupt through the three processes. The first stage is beyond the scope of the operating system and beyond my capabilities, regardless of how the device generates the signal. Various hardware peripherals have their own execution logic, there are various forms of interrupt trigger mechanism, such as edge trigger, level trigger and so on. In general, the interrupt controller sends an interrupt signal, which is translated by the interrupt controller and sent to CPUCPUCPU. CPUCPUCPU then performs the interrupt service program to process the interrupt.

Interrupt controller

Speaking of interrupt controllers, what is it? Interrupt controller can be regarded as interrupt agent, there are many peripherals, if there is no interrupt agent, external imagine to send interrupt signal to CPUCPUCPU to handle interrupt, it can only be connected to the CPUCPUCPU pin, CPUCPUCPU pin is very valuable. It’s impossible to take out that many pins to connect peripherals. Therefore, there is an interrupt controller, the interrupt spokesperson, on which all IOIOIO peripherals are connected. When sending interrupt requests, the interrupt controller sends signals to the interrupt controller, and the interrupt controller notifies the CPU, thus solving the above problems.

There are many interrupt controllers, PIC mentioned above, PIC is only used for single processor, for today’s multi-core multi-processor era, PIC can do nothing, so there is a more advanced interrupt controller APIC, APIC( Advanced Programmable Interrupt ControllerAdvanced\ Programmable\ Interrupt\ ControllerAdvanced Programmable Interrupt Controller, APIC is divided into two parts: LAPIC and IOAPIC. The former LAPIC is located inside CPUCPUCPU. Each CPUCPUCPU has a LAPIC. The latter IOAPIC is connected to peripherals. The interrupt signal sent by peripherals is processed by IOAPIC and then sent to LAPIC, which decides whether CPUCPUCPU is used for actual interrupt processing.

It can be seen that there is a LAPIC on each CPUCPUCPU. IOAPIC is a part of the system chipset. Each interrupt message is sent and received through the bus. APIC content is very complex, detailed description can refer to Intel Intel Development manual volume 3, this paper does not discuss the details, only in the upper level of the more abstract level, clear APIC mode interrupt process.

When the computer starts, APIC needs to be initialized before it can be used correctly. Here is a look at the initialization process of APIC in a relatively simple working mode:

IOAPIC

To initialize an IOAPIC, set the IOAPIC register.

So here’s the definition:

#define REG_ID     0x00  // Register index: ID
#define REG_VER    0x01  // Register index: version
#define REG_TABLE  0x10  // Redirection table Base Redirection table
Copy the code

However, these registers are not directly accessible, and need to be read and written through two other registers mapped to memory.

Two registers for memory mapping

These two registers are memory-mapped, IOREGSEL, address 0xFEC0 00000xFEC0\ 00000xFEC0 0000; IOWIN, the address is 0xFEC0 0010h0xFEC0\ 0010h0xFEC0 0010h. IOREGSEL is used to specify the register to read and write from, and then read and write from IOWIN. Is often said that the index of/data access method, or adress/dataadress dataadress/data, using the index port specified register, from the data port, speaking, reading and writing registers, data ports like all registers window.

And the so-called memory mapping, is to regard these registers as a part of the memory, read and write memory, that is, read and write register, can be used to access the memory instructions such as MOV to access the register. There is also an IO port mapping, which regards the IO ports of peripherals (some registers of peripherals) as an independent address space. Access to this space cannot be accessed by in-memory instructions, but requires special IN /out instructions.

IOREGSEL and IOWIN can access all the registers of IOAPIC, so the structure ioapicioapicioAPIC is defined as follows:

struct ioapic {
  uint reg;       //IOREGSEL
  uint pad[3];    // Fill 12 bytes
  uint data;      //IOWIN
};
Copy the code

The 121212 bytes are filled because IOREGSEL is 0xFEC0 00000xFEC0\ 00000xFEC0 0000 and has a length of 4 bytes, IOWIN is 0xFEC0 00100xFEC0\ 00100xFEC0 0010, There is a difference of 1112 bytes between the two, so filling 121212 bytes is convenient for operation.

Select a register with IOREGSEL, and then read and write the corresponding register from IOWIN, so you can also understand the following two read and write functions:

static uint ioapicread(int reg)
{
  ioapic->reg = reg;    // Select register reg
  return ioapic->data;  // Read register REg data from window register
}

static void ioapicwrite(int reg, uint data)
{
  ioapic->reg = reg;    // Select register reg
  ioapic->data = data;  // Writing to the window register is equivalent to writing to the register reg
}
Copy the code

These two functions is based on the index/dataindex dataindex/data to read/write IOAPIC register. Let’s take a look at what IOAPIC registers mean, and then we’ll know why we need to initialize them this way and that. Only the registers involved in XV6XV6xv6 are described below, for others, see the links at the end of the article.

IOAPIC register

ID Register

  • The index is 0

  • Bit24 − BIT27BIT24 – bit27BIT24 −bit27: INDICATES the ID

Version Register

  • The index is 1

  • Bit0 − BIT7bit0 -bit7bit0−bit7 Indicates the version.

  • Bit16 −bit23bit16-bit23bit16−bit23 Indicates the maximum number of redirection entries, which is 23(counting from 0).

Redirection entry

IOAPIC has 24 pins. Each pin corresponds to a 64-bit redirection entry (also equivalent to a 64-bit register) stored in 0x10−0x3F0x10-0x3F0x10−0x3F. The format of the redirection entry is as follows:

This is summed up by ZX_WINGZX\_WINGZX_WING in his book Interrupt in LinuxInterrupt\ in LinuxInterrupt in Linux, which is very comprehensive and very complicated. Some fields are interpreted in conjunction with the initialization code below.

IOAPIC initialization

#define IOAPIC  0xFEC00000   // Default physical address of IO APIC

void ioapicinit(void)
{
  int i, id, maxintr;

  ioapic = (volatile struct ioapic*)IOAPIC;      / / IOREGSEL address
  maxintr = (ioapicread(REG_VER) >> 16) & 0xFF;  // Read the version register 16-23 bits to get the maximum number of interrupts
  id = ioapicread(REG_ID) >> 24;      // Read the ID register 24-27 to obtain the IOAPIC ID
  if(id ! = ioapicid) cprintf("ioapicinit: id isn't equal to ioapicid; not a MP\n");

  // Mark all interrupts edge-triggered, active high, disabled,
  // and not routed to any CPUs. Set all interrupt redirection entries to edge, high efficiency, and masking state
  for(i = 0; i <= maxintr; i++){   
    ioapicwrite(REG_TABLE+2*i, INT_DISABLED | (T_IRQ0 + i));  // Set the lower 32 bits, 64 bits per entry, so 2* I,
    ioapicwrite(REG_TABLE+2*i+1.0);   // Set the height to 32 bits}}Copy the code

Macro definition IOAPICIOAPICIOAPIC is an address values, this address is IOREGSEL register in memory mapping, by index/dataindex dataindex read ID/data mode, support the information such as the interrupt number.

IOAPIC IDIOAPIC\ IDIOAPIC ID In MP Configuration Table EntryMP\ Configuration\ Table\ EntryMP Configuration Table Entry MP TableMP\ TableMP Table we mentioned in @@@@@@@@@@@, simply put, MP TableMP\ TableMP Table has a variety of entries, records some configuration information under the multi-processor, The computer can get useful information from it when it boots up. The article @@@@ shows only processor type entries. The number of processor type entries indicates how many processors there are. For IOAPIC, each IOAPICIOAPICIOAPIC entry has its IDIDID record. About MP TableMP \ TableMP Table let’s unquestioning, interested can go to the public, the background for MP SpecMP \ SpecMP Spec document information, a detailed explanation.

This is followed by a forforfor loop to initialize the 24 redirection entries. Let’s see what’s set:

  • T_IRQ0+iT\_IRQ0+iT_IRQ0+ I, this represents the interrupt vector number, an interrupt vector number represents an interrupt. Indicates that this redirection entry handles the interrupt T_IRQ0+iT\_IRQ0+iT_IRQ0+ I.

  • # d e f i n e    I N T _ D I S A B L E D    0 x 00010000 \#define\ \ INT\_DISABLED\ \ 0x00010000
    Set this bit to mask interrupts associated with this redirection entry, that is, when hardware peripherals are directed toIOAPICWhen sending an interrupt signal,IOAPICDirect mask ignore.
  • Set bit13BIT13BIT13 and bit15BIT15bit15 to 0, respectively, indicating that pin high level is valid and trigger mode is edge trigger. This is a concept in digital logic, and it should be known that the basic things still need to be known.
  • Set bit11bit11bit11 to 0 for Physical ModePhysical\ ModePhysical Mode, Set the high 8-bit Destination FieldDestination\ FieldDestination Field to 0. In Physical ModePhysical\ ModePhysical Mode, Destination FieldDestination FieldDestination Field LAPIC IDLAPIC IDLAPIC ID, LAPIC IDLAPIC\ IDLAPIC ID also uniquely identifies a CPUCPUCPU, So the Destination FieldDestination\ FieldDestination Field means that the interrupt will be routed to the CPUCPUCPU for processing.

So this initialization sets all redirection entries to edge-triggered, high active, and all interrupts are routed to CPU0CPU0CPU0, but all interrupts are masked. The comment xv6xv6xv6 describes that interrupts are not routed to any processor, which I think is incorrect, although all interrupts are shielded, The Destination FieldDestination\ FieldDestination Field is routed to CPU0CPU0CPU0.

T_IRQ0T\_IRQ0T_IRQ0 is a macro with a value of 32. The first 32 interrupt vector numbers are assigned to some exceptions or reservations. The following interrupt vectors 32 to 255 are used by some external interrupts or INT n instructions.

When the IOAPIC is initialized, all interrupts corresponding to the pins are shielded. If the IOAPIC is initialized, it must be enabled at some time. Otherwise, it will not work.

void ioapicenable(int irq, int cpunum)
{
  // Mark interrupt edge-triggered, active high,
  // enabled, and routed to the given cpunum,
  // which happens to be that cpu's APIC ID. Call this function to enable the appropriate interrupt
  ioapicwrite(REG_TABLE+2*irq, T_IRQ0 + irq);
  ioapicwrite(REG_TABLE+2*irq+1, cpunum << 24);  // Move 24 bits left to fill in the Destination field
}
Copy the code

T_IRQ0+irqT\_IRQ0 + irqT_IRQ0+ IRq is the interrupt vector number. Fill in the lower 8-bit vector field to indicate that the redirection entry processes the interrupt

Cpunumcpunumcpunum is the CPU number. The mp.cmp.cmp.c file defines a global array about CPUCPUCPU, which stores information about all CPUCPUCPU. In xv6xv6xv6, the index of this array is cpunumcpunumcpunum, also known as LAPIC IDLAPIC\ IDLAPIC ID, which uniquely identifies a CPUCPUCPU. When initialized, Destination ModeDestination\ ModeDestination Mode is 0. Calling this function does not change this bit, so it is still 0. So writing cpunumcpunumcpunum to the Destination FieldDestination\ FieldDestination Field routes the interrupt to the CPUCPUCPU.

To do a simple test, ideinit()ideinit()ideinit() ideinit() ioapicEnable () ioapicEnable () ioapicEnable () ioapicEnable () :

ioapicenable(IRQ_IDE, ncpu - 1);     // Let the CPU handle hard disk interrupts
Copy the code

CPUCPUCPU = CPUCPUCPU = CPUCPUCPU = CPUCPUCPU = CPUCPUCPU = CPUCPUCPU = CPUCPUCPU

First set CPUCPUCPU to multiple processors in MakefileMakefileMakefile. I set it to 4:

ifndef CPUS
CPUS := 4
endif
Copy the code

Then add the printfprintf statement to the trap.ctrap.ctrap.c file:

case T_IRQ0 + IRQ_IDE:    // If the disk is interrupted
    ideintr();            // Call the disk interrupt program
    lapiceoi();           // The EOI table is interrupted
    cprintf("ide %d\n", cpuid());  // Prints the CPU number
    break;
Copy the code

We’ll talk about this later, but let’s take a look at it in advance. It should be easy to understand with comments.

CPUCPUCPU number is 4, CPUCPUCPU number for handling disk interrupts is 3, as expected, so much for IOAPICIOAPICIOAPIC initialization, let’s look at LAPICLAPICLAPIC initialization.

LAPIC

LAPIC is much more complex than IOAPIC, put a general picture:

Xv6xv6xv6 does not involve such complexity. Its main function is to receive interrupt messages from IOAPIC and send them to CPUCPUCPU. Moreover, it can also be used as an interrupt source to generate interrupt messages and send them to itself or other CPUCPUCPU. The same initialization of LAPIC is to set the relevant registers, but there are too many registers for LAPIC. This article only describes the registers involved in Xv6. For the rest, please refer to @@@@@@@@@@@, or the link at the end of this article.

The register of LAPIC is mapped in memory, and the starting address is generally 0xFEE0 00000xFEE0\ 00000xFEE0 0000 by default, but this address is not set and used by itself. The start address can be obtained in MP Table HeaderMP\ Table\ HeaderMP Table Header. See @@@@@@@@@@ at the end of this article. Therefore, you can define and obtain the lapiclapic address as follows

/*lapic.c*/volatile uint *lapic;  // Initialized in mp.c/*mp.c*/lapic = (uint*)conf->lapicaddr; //conf is the MP Table Header, which records LAPIC address information
Copy the code

Lapiclapiclapic can also be viewed as an array of type Uintuintuint, 4 bytes per element, so the index of each register is calculated by dividing the offset by 4. For example, if the offset of the ID register relative to the lapiclapiclapic base address is 0x200x200x20, the index of the ID register in the Lapiclapic array should be 0x20/4. The offset of each register is shown in the link at the end of this article.

Because LAPIC register is memory mapping, so set register is directly read and write the corresponding memory, so read and write register implementation is very simple:

static void lapicw(int index, int value)   // write value{lapic[index] = value; lapic[ID]; // wait for write to finish, by reading }
Copy the code

This appears to be write memory, but in fact this part of the address has been allocated to the LAPIC, the hardware write operation generally has to stop for a while to wait for the write operation to complete, you can see the disk keyboard and other hardware initial configuration have similar wait operation, here directly use the way of reading data to wait for the write operation to complete.

LAPIC initialization

Lapicinit ()lapicinit()lapicinit() lapicinit()

lapicw(SVR, ENABLE | (T_IRQ0 + IRQ_SPURIOUS));#define SVR     (0x00F0/4)   // Spurious Interrupt Vector #define ENABLE 0x00000100 // Unit Enable
Copy the code

SVR pseudo-interrupt register, CPUCPUCPU will execute two consecutive INTAINTAINTA cycles each time it responds to an INTRINTRINTR(interrupt-masking). As described in MP SpecMP\ SpecMP Spec, when an interrupt becomes invalid after the first INTAINTAINTA cycle and before the second INTAINTAINTA cycle, it is a pseudo interrupt. That is to say, the pseudo interrupt is caused by the interrupt pin not maintaining enough effective level. This mainly involves the electrical side of things, we understand.

The field in SVRSVRSVR has other functions. Bit 8bit\ 8bit 8 1 indicates that LAPIC is enabled. LAPIC must be enabled.

lapicw(TDCR, X1);   / / set the frequency coefficient lapicw (TIMER, PERIODIC | (T_IRQ0 + IRQ_TIMER)); // Set Timer mode and interrupt vector number LAPICW (TICR, 10000000); #define TICR (0x0380/4) // Timer Initial Count#define TDCR (0x03e0/4) // Timer Divide Configuration#define TIMER (0x0320/4) // Local Vector Table 0 (TIMER) #define X1 0x0000000B // divide counts by 1 #define PERIODIC 0x00020000  // Periodic
Copy the code

The LAPIC comes with a programmable timer that can be used as a clock to trigger a clock interrupt. TDCR(The Divide Configuration Register)TDCR(The\ Divide\ Configuration Register)TDCR(The Divide Configuration Register) Register, The Initial−Count Register (TICR)TICR(The\ Initial-count Register)TICR(The\ Initial-count Register), and LVT Timer RegisterLVT Timer RegisterLVT Register Xv6xv6xv6 registercurrent-count registercurrent-count Register xv6xv6xv6 registercurrent-count Register xv6xv6 registercurrent-count Register xv6xv6 registercurrent-count Register

These registers represent Local Vector Table (LVT) Local Vector Table (LVT) Local Vector Table (LVT) Local Vector Table (LVT) Local Vector Table (LVT) Local Vector Table (LVT) Interrupts can also occur themselves, as listed above.

As can be seen from the above TimerTimerTimer register bit17 bit18bit17, bit18bit17, bit18 set the Timer ModeTimer \ ModeTimer Mode, Xv6xv6xv6 is set to 010101, which, as the name suggests, is periodic, decreasing from a certain number to 0, and so on.

This number is set in the TICRTICRTICR register, and the value set to xv6XV6xv6 is 100000001000000010000000

There is a decreasing frequency, which is the bus frequency of the system and then the frequency division. The frequency division coefficient is set in the TDCRTDCRTDCR register, xv6XV6xv6 is set to 1 frequency division, which is equivalent to no frequency division, that is, the bus frequency is used.

T_IRQ0+IRQ_TIMERT\_IRQ0 +IRQ \_TIMERT_IRQ0+IRQ_TIMER is the clock interrupt vector number, set in the lower 8 bits of the TimerTimerTimer register.

So much for setting up clock interrupts, each CPUCPUCPU has LAPICLAPICLAPIC, so clock interrupts occur on each CPUCPUCPU, unlike other interrupts, which are assigned a CPUCPUCPU to handle.

Back to initialization of LAPIC:

// Disable logical interrupt lines.lapicw(LINT0, MASKED); lapicw(LINT1, MASKED);
Copy the code

LINT0, LINT1LINT0, LINT1LINT0, LINT1 connect to i8259Ai8259Ai8259A and NMINMINMI, but actually only connect to BSPBSPBSP(CPUCPUCPU), Only BSPBSPBSP can receive both interrupts. LINT0LINT0LINT0 is set to ExtINTExtINTExtINT. LINT1LINT1LINT1 is set to NMINMINMI mode. If APAPAP directly sets the mask bit to mask both interrupts. Xv6xv6xv6 simplifies processing by using only APIC mode, and all lapICS mask both interrupts.

if(((lapic[VER]>>16) & 0xFF) > =4)	lapicw(PCINT, MASKED);// Map error interrupt to IRQ_ERROR.lapicw(ERROR, T_IRQ0 + IRQ_ERROR); // Clear error status register (requires back-to-back writes).lapicw(ESR, 0); lapicw(ESR, 0); #define VER (0x0030/4) // Version#define ERROR (0x0370/4) // Local Vector Table 3 (ERROR)#define PCINT (0x0340/4) // Performance Counter LVT#define ESR (0x0280/4) // Error Status
Copy the code

Bit16 − bit23BIT16 – bit23BIT16 −bit23 of the Version Register is the number of LVTLVTLVT local interrupt entries. If the number exceeds four, the performance count overflow is shielded. This is described in Volume 3 of Intel manual. After reading it, I am still muddled. I feel that I will not touch it in ordinary times.

ERROR Register, which is set to map the ERRORERRORERROR interrupt that is triggered when APICAPIC detects an internal ERROR, The interrupt vector number is T_IRQ0+IRQ_ERRORT\_IRQ0 +IRQ \_ERRORT_IRQ0+IRQ_ERROR

ESR(ERROR Status Register)ESR(ERROR\ Status\ Register) The ESR records the ERROR Status. The ESR is initialized to clear the ERROR Status.

lapicw(EOI, 0);#define EOI     (0x00B0/4)   // EOI
Copy the code

EOI(End of InterruptEnd\ of InterruptEnd of Interrupt). After the Interrupt processing is complete, the EOI register is written to indicate that the Interrupt processing has completed. The reset initialization value should be 0.

lapicw(ICRHI, 0); lapicw(ICRLO, BCAST | INIT | LEVEL);while(lapic[ICRLO] & DELIVS)	;#define ICRHI   (0x0310/4)   // Interrupt Command [63:32]#define TIMER (0x0320/4) // Local Vector Table 0 (TIMER 0x00000500 // INIT/RESET #define STARTUP 0x00000600 // Startup IPI #define DELIVS 0x00001000 // Delivery status #define ASSERT 0x00004000 // Assert interrupt (vs deassert) #define DEASSERT 0x00000000 #define LEVEL 0x00008000 // Level triggered #define BCAST 0x00080000 // Send to all APICs, including self. #define BUSY 0x00001000 #define FIXED 0x00000000
Copy the code

ICR(Interrupt Command RegisterInterrupt\ Command\ RegisterInterrupt Command Register) When one CPUCPUCPU wants to send an interrupt to another CPUCPUCPU, the corresponding interrupt vector and target LAPIC identity are filled in the ICR, and a message is sent to the target LAPIC across the bus. The ICR register fields are similar to IOAPIC redirection entries because the same LAPIC is sending interrupt messages to another LAPIC. All have Destination Field,Delivery Mode,Destination Mode,LevelDestination\ Field,Delivery Mode,Destination \ Mode, LevelDestination Field,Delivery Mode,Destination Mode,Level, etc.

Send an Init Level De−Assert to synchronise arbitration ID’s Send\ an\ Init\ Level\ de-assert \ to\ synchronise\ Arbitration \ ID’s send an Init Level De−Assert to synchronise arbitration ID’s. In combination with Intel Intel manual, the function is to set the Arb IDArb\ IDArb ID of all CPUCPUCPU APICS to the initial value APIC IDAPIC\ IDAPIC ID.

On the Arb, the explanations in the reference InterruptinLinuxInterrupt LinuxInterruptinLinux in:

Arb, Arbitration Register, Arbitration Register The register uses four bits to represent 16 priorities from 0 to 15 (15 is the highest priority), which is used to determine the priority of LAPIC competing with APIC BUS. After the system is RESET, the Arb of each LAPIC is initialized as its LAPIC ID. In bus contention, the LAPIC with the largest Arb wins the bus, clears its OWN Arb, and adds one to the ARBs of other lapICS. Thus, Arb arbitration is a polling mechanism. The INIT IPI triggered by Level can synchronize the ARBs of each LAPIC back to the current LAPIC ID.

// Enable interrupts on the APIC (but not on the processor).lapicw(TPR, 0); #define TPR (0x0080/4) // Task Priority
Copy the code

The task priority register determines what priority level interrupts the current CPU can handle. The CPU only handles interrupts higher than those in TPR. Interrupts lower than this are temporarily masked, that is, continue to wait in the IRR.

This is a simple initialization of LAPIC in xv6XV6xv6, but it is not simple and involves a lot of things. Clapic.clapic.c: clapic.c: clapic.c: clapic.c: clapic.c: clapic.c: clapic.c: clapic.c: clapic.c: clapic.c: clapic.c: clapic.c: clapic.c: clapic.c

int lapicid(void)   // Return CPU/LAPIC ID{if (! lapic) return 0; return lapic[ID] >> 24; }
Copy the code

This function returns LAPIC IDLAPIC\ IDLAPIC ID. The ID register bit24bit24bit24 indicates LAPIC IDLAPIC\ IDLAPIC ID. CPUCPUCPU corresponds to LAPIC. So this also returns CPU IDCPU\ IDCPU ID, which is also the index in CPUCPUCPU array. In IOAPICIOAPICIOAPIC, the cpuID () cpuID () function is a wrapper around this function.

void lapiceoi(void){  if(lapic)    lapicw(EOI, 0); }Copy the code

This function is often used in interrupt service programs. Let’s look at the two important registers in LAPIC:

  • IRR Interrupt request register, 256 bits, each representing an interrupt. When an interrupt message is sent and the interrupt is not masked, the bit of the IRR is set to 1, indicating that the interrupt request has been received but not yet processed by the CPU.

  • Registers in the ISR service, 256 bits, each representing an interrupt. When an interrupt request in the IRR is sent to the CPU, the bit corresponding to the ISR is set to 1, indicating that the CPU is processing the interrupt.

The above is the APIC initialization and some important functions explained, with these understanding, to take a general look at the APIC part of the interrupt process:

  1. The peripheral triggers an interrupt and sends an interrupt signal toIOAPIC
  2. IOAPICAccording to the
    P R T PRT
    The table translates the interrupt signal into an interrupt message and sends it to
    D e s t i n a t i o n   F i e l d Destination\ Field
    Field listed
    L A P I C LAPIC
  3. LAPICAccording to the message
    D e s t i n a t i o n   M o d e Destination\ Mode
    .
    D e s t i n a t i o n   F i e l d Destination\ Field
    , its own registerIDTo determine whether to receive the interrupt message, setIRRThe corresponding
    b i t bit
    Bit, otherwise ignored

  4. C P U CPU
    From when the next interrupt can be handledIRRSelect the interrupt with the highest priority from theISRCorresponding to position 1, and then send
    C P U CPU
    The execution.
  5. CPUCPUCPU executes interrupt service routines to handle interrupts
  6. Write after interrupt processing is completeEOIIndicates that interrupt processing is complete, writeEOILead toISRThe corresponding position is zero for theta
    l e v e l level
    Triggered interrupts will also be directed to allIOAPICsendEOIMessage notifying that interrupt processing has completed.

The above process is just a very simple general process, does not involve the non-maskable interrupt, some special interrupt, interrupt nesting, etc., just to briefly understand how APIC works in the interrupt, then focus on CPUCPUCPU part of the interrupt processing.

The CPU part

This is the initialization part of APICAPICAPIC, called by main()main()main() main() in main.cmain.cmain.c, as part of the environment initialization when the computer starts. Let’s look at the part where CPUCPUCPU handles interrupts. Let’s review how the CPUCPUCPU section handles interrupts roughly:

  • CPUCPUCPU Received the interrupt vector number from the interrupt controller
  • IDTIDTIDT index the gate descriptor according to the interrupt vector number, and GDTGDTGDT index the segment descriptor according to the segment selector in the gate descriptor
  • During this period, CPUCPUCPU will check the privilege level. If the privilege level changes, for example, when the user state enters the kernel state, push SSSSSS into the original stack and ESPESPESP into the kernel stack. If there is no change, do not push into the kernel stack. CSCSCS, EIPEIPEIP, EFLAGSEFLAGSEFLAGS, and error codes if the interrupt has an error code.
  • Gets the address of the interrupt service routine based on the segment base address in the segment descriptor and the offset in the interrupt descriptor
  • Execute interrupt service routines, during which resources such as registers are pressed to save context
  • Restore context after execution, writeEOI​Table interrupt complete

So some registers are pressed into the stack before the interrupt is formally processed, as follows:

Next is to go to IDT, GDT index gate descriptor and segment descriptor, looking for interrupt service program, this article mainly about interrupt, so only to look at IDT, GDT related content I have talked about in @@@@@@@@@, you can refer to.

Build the IDT

So, IDT (Interrupt Descriptor Table), Interrupt Descriptor Table, we have to have a Table, CPUCPUCPU can use the vector number sent by the interrupt controller to index the gate descriptor in IDTIDTIDT.

So we have to build an IDT, to build IDT is to build a interrupt descriptor, generally called the door descriptor, IDT can store several door descriptors, such as call door descriptor, trap door descriptor, task door descriptor, interrupt door descriptor. Most interrupts use interrupt gate descriptors. Let’s look at the format of interrupt gate descriptors:

Interrupt/trap door descriptors: interrupt/trap door descriptors: interrupt/trap door descriptors: Interrupt/trap door descriptors: Interrupt/trap door descriptors: interrupt/trap door descriptors

struct gatedesc {  uint off_15_0 : 16;   // low 16 bits of offset in segment uint cs : 16; // code segment selector uint args : 5; // # args, 0 for interrupt/trap gates uint rsv1 : 3; // reserved(should be zero I guess) uint type : 4; // type(STS_{IG32,TG32}) uint s : 1; // must be 0 (system) uint dpl : 2; // descriptor(meaning new) privilege level uint p : 1; // Present uint off_31_16 : 16; // high bits of offset in segment};
Copy the code
  • Bit0 − BIT15BIT0 – bit15BIT0 − BIT15: indicates the offset of the interrupt service program in the target code segment of 0 to 15 bits
  • Bit16 − BIT31BIT16 – bit31BIT16 − BIT31: indicates the segment selector of the segment where the interrupt service program resides
  • Bit40 − BIT43BIT40 – BIT43BIT40 − BIT43: The TYPETYPETYPE value of the interrupt gate is 1110, and the trap gate is 1111
  • Bit44bit44bit44: THE S field 0 indicates the system segment, and all kinds of gate structures are system segments, which means that this is the structure required by the hardware, whereas the software needs non-system segments, including commonly referred to as data segments and code segments, which are not necessary for the hardware and are non-system segments.
  • Bit45 – bit46bit45 – bit46bit45 – bit46: DPL(Descriptor Privilege Level)DPL(Descriptor\ Privilege\ Level) Privilege Level, which is used to check Privilege Level when an interrupt is entered.
  • Bit47bit47bit47: P(Present)P(Present)P(Present) Whether the segment exists in the memory. If yes, the value is 1. Otherwise, the value is 0
  • Bit48 − BIT63BIT48 -bit63bit48− BIT63: Indicates the interrupt service program offset in the kernel code segment of 16 to 31 bits

It can be seen from the meaning of some fields above that the construction of interrupt gate descriptor also needs the address information of the interrupt service program, so we should first prepare each interrupt service program and obtain their address information. In xv6xv6xv6, all interrupts have the same entry routine, and the address of this entry routine is filled in the interrupt gate descriptor.

IDT supports 256 entries and 256 interrupts, so there are 256 entry programs that do a similar job, so xv6xv6xv6 uses perlPerlPerl scripts to generate code in batches. The script file is vectors. Plvectors. Plvectors. Pl, the generated code is shown below:

Globl alltraps. Globl vector0 # vector0: pushl $0 pushl $0 jmp alltraps#############################.globl vector8vector8: Pushl $8 JMP alltraps# # # # # # # # # # # # # # # # # # # # # # # # # # # # # #. Globl vectors # entrance program array vectors: .long vector0 .long vector1 .long vector2Copy the code

This is a piece of assembly code where all interrupt entry routines do the same three or two things:

  • Some interrupts have error codes pushed onto the stack, so for consistency, interrupts without error codes are also pushed onto one thing: 0
  • Press in your own interrupt vector number
  • Jump to AllTrapsallTrapsallTraps to execute interrupt handlers

Only interrupts/exceptions that are not generated by an error code are executed. The main part of the error code is the selectors, which are generally not used. The perlPerlPerl script does special processing for exceptions with error codes:

if(! ($i ==8 || ($i >= 10 && $i <= 14) || $i == 17)) {print " pushl \$0\n";
Copy the code

Indicates that the vector number is 8, 10-14, 178, 10-14, 178, 10-14, and 17. Error codes are generated, and 0 is not required.

The 256 interrupt entry program addresses are written to a large array of VectorSvectorsvectors, so the address information required by the interrupt gate descriptor does not come immediately, so the IDT is constructed as follows:

struct gatedesc idt[256].extern uint vectors[];  // In vectors.S: array of 256 entry pointersvoid tvinit(void) for(i = 0; i < 256; i++) SETGATE(idt[i], 0, SEG_KCODE<<3, vectors[i], 0); SETGATE(idt[T_SYSCALL], 1, SEG_KCODE<<3, vectors[T_SYSCALL], DPL_USER); initlock(&tickslock, "time"); }#define SETGATE(gate, istrap, sel, off, d) DPL{(gate).off_15_0 = (uint)(off) &0xFFFF; \ (gate).cs = (sel); \ (gate).args = 0; \ (gate).rsv1 = 0; \ (gate).type = (istrap) ? STS_TG32 : STS_IG32; \ (gate).s = 0; \ (gate).dpl = (d); \ (gate).p = 1; \ (gate).off_31_16 = (uint)(off) >> 16; The \}
Copy the code

The SEGGATESEGGATESEGGATE macro simply builds an interrupt descriptor from the information and should be easy to read.

Interrupt service program belongs to kernel program, segment selector sub is kernel code segment, DPLDPLDPL is set to 0, but system call needs special treatment, DPLDPLDPL field must be set to 3. The reason for setting this is related to the privilege check: RPL(RequestPrivilegeLevel)RPL(RequestPrivilegeLevel, The request privilege Level is CPL(Current Previlege Level), So CPL= cs.rplcpl = cs.rplcpl = cs.rpl. Isn’t that tricky? I can’t help it. That’s the way it is.

What kind of privilege check should I do? CPLCPLCPL needs to be greater than or equal to the DPLDPLDPL of the selectors in the gate descriptor, and for system calls CPLCPLCPL needs to be less than or equal to the DPLDPLDPL of the gate descriptor, otherwise the general protective error exception will be raised. In user mode, CPL=3CPL =3CPL =3. If the gate descriptor DPLDPLDPL is still 0, the privilege check cannot be passed and an exception will be triggered. Therefore, for system call DPLDPLDPL must be set to 3.

This is a bit far, privilege checking is a very complicated thing, there is no RPLRPLRPL checking. After the IDT is built, it needs to load its address into the IDTR register so that CPUCPUCPU knows where to find the IDT.

void idtinit(void){  lidt(idt, sizeof(idt));      }static inline void lidt(struct gatedesc *p, int size) Then reload to the IDTR register {volatile USHORT PD [3]; pd[0] = size-1; pd[1] = (uint)p; pd[2] = (uint)p >> 16; asm volatile("lidt (%0)" : : "r" (pd)); }
Copy the code

The IDTR register has 48 bits


  • b i t 0 b i t 15 bit0-bit15
    saidIDTThat is, the table is so large that the maximum range represented is zero
    0 x F F F F 0xFFFF
    , that is,
    64 K B 64KB
    , a gate descriptor is 8 bytes, so the most descriptors
    64 K B / 8 B = 8192 64KB/8B = 8192
    , but the processor only supports 256 interrupts, i.e., 256 gate descriptors.

  • b i t 16 b i t 48 bit16-bit48
    saidIDTBase address

The array PDPDPD in the above code is the 48 bits of data. The data is constructed and then loaded into the IDTR register using the lidtlidtlidt instruction. About inline assembly, please refer to my previous article: @@@@@@@@

Interrupt service routine

Once IDTIDTIDT is ready, this section formally looks at the flow of the interrupt service routine. I have divided it into three stages: interrupt entry, interrupt handling, and interrupt exit. Let’s look at them one by one:

Interrupt entry routine

Interrupt entry programs primarily store interrupt context. Vectorsvectorsvectors record only a partial list of entry programs that do three things: press 0/ error codes, press vector numbers, and jump to AllTrapsallTrapsallTraps.

So the current stack situation is as follows:

Then the program jumps to the AllTrapsallTrapsallTraps. Let’s see what this is:

.globl alltrapsalltraps: # Build trap frame. Pushl % DS pushl %es pushl %fs pushl %gs pushal # Set up data segments. Movw $(SEG_KDATA<<3), % AX MOVw %ax, % DS movw %ax, %es # Call trap Where tf=%esp calls trap.c() pushl %esp Call trap addl $4, %espCopy the code

You can see that the Alltraps also do three things:

  • Create a stack frame to save the context
  • Set the data segment register to the kernel data segment
  • Call trap.c()trap.c()trap.c() trap

1. Create a stack frame to save the context

To set up the frame saving context is to press all kinds of register resources on the stack, xv6xv6xv6 directly and violently press all the registers directly into the stack. First push into each segment register, then push into all general registers pushalpushalpushal Order eax, ecx, edx, ebx, esp, ebp, esi, edieax, ecx, edx and ebx, esp, ebp, esi, edieax, ecx, edx and ebx, esp, ebp, esi, edi.

So now the stack is:

So define stack frames like this:

struct trapframe {  // registers as pushed by pusha uint edi; uint esi; uint ebp; uint oesp; // useless & ignored ESP value uint ebx; uint edx; uint ecx; uint eax; // rest of trap frame ushort gs; ushort padding1; ushort fs; ushort padding2; ushort es; ushort padding3; ushort ds; ushort padding4; uint trapno; // below here defined by x86 hardware uint err; uint eip; ushort cs; ushort padding5; uint eflags; // below here only when crossing rings, such as from user to kernel uint esp; ushort ss; ushort padding6; };
Copy the code

It can be seen that the defined interrupt stack frame structure corresponds to the previous operation one by one, indicating two points:

  • The segment register is only 16 bits 2 bytes, and pushlpushlPUShl is used to push the segment register with a double word 4 bytes, so we need shortshort to fill 2 bytes. It is also possible to define the segment register as uIntuintuint directly, without defining the fill variable.
  • When pushalpushalpushal is pressed into the general register, these registers plus the value CPUCPUCPU automatically pressed into the interrupt is the context of the process at the moment before the interrupt occurs. So here ESPESPESP that pushalpushalpushal pushed into it says garbage ignored, why? Buy a pass, and then say together with the stack problem.

2. Set the data segment register to the kernel data segment

The privilege level check has been carried out when the gate descriptor is indexed by vector number, and the segment selector in the gate descriptor — kernel code segment selector is loaded into CS. Here, only the data segment register needs to be set as kernel data segment. The attached segment, the attached data segment, is usually set in the same way as the data segment. In the serial operation instruction, the attached segment is used as the storage area of the destination operand, as shown in the previous inline assembly @@@@@@@@@@@@@@@@@@@

3. Call the interrupt handler

Pushpushpush callCallCall, standard function call, pushpushpush parameter, callCallCall call. Push %esppush\ \%esppush %esp, esp is the address of the top element of the interrupt stack frame, i.e. the first address of the interrupt stack frame. Then call TrapCall trapCall trap calls the interrupt handler, presses the return address (the address of the addL \ \4, % ESP statement), and jumps to the address of the statement. Then jump to the address of the statement), then jump to trap()$executor.

At this point in the stack:

Interrupt handler

The above operation interrupts the stack frame of the parameters needed by the interrupt handler trap(struct\ * trapFrame) Trap (struct * trapFrame) Trapframetrapframetrapframe address into the stack. Is trap trap trap () () () also like the entrance to the interrupt service routine, the whole program is composed of many conditional statements, according to the vector of trapframetrapframetrapframe number to perform different branch of the interrupt handlers, to literally see a few:

if(tf->trapno == T_SYSCALL){    // System call if(myproc()->killed) // Exit () if the current process has been killed; // Exit myproc()->tf = tf; // Stack frame syscall(); If (myproc()->killed) exit(); return; / / return}
Copy the code

If the vector sign indicates that this is a system call, the system call is made, which will be covered later in this article.

switch(tf->trapno){  case T_IRQ0 + IRQ_TIMER:     If (cpuID () == 0){acquire(&tickslock); ticks++; wakeup(&ticks); release(&tickslock); } lapiceoi(); break; Case T_IRQ0 + IRQ_IDE: // Ideintr (); lapiceoi(); break; / * * * * * * * * * * * * * * * * * * * * * * * * * * * * * /
Copy the code

If it is a clock interrupt and CPU0CPU0CPU0 emits the clock interrupt, add 1 to tickSticksticks. Each CPUCPUCPU has its own LAPIC, and thus its own APIC Timer, capable of triggering clock interrupts. Ticksticksticks records the ticks of the system from the beginning to the present time. As the system time, a clock interrupt increases the value by 1, but only one CPU can modify the TickSticksticks. Otherwise, if all CPUCPUCPU could change the value of TickSticksticks, it would be a mess. So select CPU0CPU0CPU0 is also BSPBSPBSP to change the value of TickStickSticks. After processing, the clock of EOI table writing is interrupted.

If the interrupt is issued by the disk, it calls the disk interrupt handler, which is also the main body of the disk driver, see the previous section to take you to understand the disk driver @@@@@@@@@@@@@@@@. After processing, write EOI table interrupt completion.

Other interrupts are handled in this way, not one example, some interrupts are not covered, but all interrupts are handled in this way, according to the vector number call different interrupt handlers, after processing write EOI table interrupt complete.

Interrupt exit routine

After the trap trap trap () () () function, after back to assembler trapasm. Strapasm. Strapasm. S:

# Call trap(tf), where tf=%esp pushl %esp call trap addl $4, %esp # Return falls through to trapret.... Globl trapret # interrupt return exit trapret: popal popl %gs popl %fs popl %es popl %ds addL $0x8, %esp # trapno and errcode iretCopy the code

The interrupt exit routine is basically the reverse of the interrupt entry routine.

First, after trap()trap()trap() is returned, the stack space occupied by the parameters is cleared, and the ESP is moved up by 4 bytes. General system source is assembly and C procedures, so the use of cdeclcDeclcDECL call convention, the convention provides the parameters from right to left into the stack, EAX,ECX,EDX saved by the caller, is also the caller to clean up the stack space and so on. What about clearing stack space? In order for the stack to be correct, the top pointer of the stack must be moved up by 4 bytes for the following operation popalPopalpopal to be correct.

After the stack space is cleared, each register is popped up. When the error code vector number is reached, ESP is directly skipped by 8 bytes.

The stack changes as follows:

Here are two points:

  • Poppoppop does not actually clean up the stack space, but the ESPESPESP pointer and the pop destination register change accordingly, leaving the stack unchanged.
  • When was the return address skipped? Retretret: Retretret: retretret: retretret: Retretret: Retretret: Retretret: Retretret: Retretret: Retretret Here is assembly and CCC language mixed programming, CCC code trap.ctrap.ctrap.c compiled after retretret, so the pop-up return address occurs after trap()trap()trap() is executed.

Now ESPESPESP is pointing to EIP_OLDEIP\_OLDEIP_OLD, and it is time to execute iRetiret. Iretiret checks to see if a privilege transfer has occurred. If there is no privilege transfer, EIP is popped. CS and EFALGS, ESP and SS will also pop up if privileges are transferred.

If all states of the original task are restored, the interruption ends and the original task continues.

This is the general process of interrupts, not just xv6xv6xv6, but all systems based on x86x86x86 have similar processes, although complex operations have more subtle operations to handle interrupts, but the general process is like this.

Take a look at the process diagram below:

Mainly locate the interrupt service routine, as for the actual handling interrupts the process diagram is not painted, grasp the change of the stack above the line, and stack changes in the above figure should describe very clear, so there is no longer here, speaking of the stack, we also left some problems about stack above, answer here:

The problem of the stack

Finally, a little bit about the stack, which has always been a confusing problem, and I’ve always thought that if the operating system can sort out the stack, it’s basically fine. When entering the interrupt, if the privilege level changes, SS and ESP are pushed into the kernel stack first, then CS, EIP, and EFLAGS.

This sentence seems fine, but have you ever wondered how to find the kernel stack? After switching to the kernel stack, ESP already points to the kernel stack, but the ESP we pushed in should be the top value of the old stack before switching to the kernel stack, so how to get the old stack value and then push in? If the order of the registers in the stack is simply popl %esppopl\ \%esppopl %esp, and then popl %SSpopl\ \%SSpopl %SS, wouldn’t that be a big deal?

The hardware architecture provides a way to switch to the kernel stack in the first place. There is a register called TR register, TR register holds the TSS segment selector, according to the TSS segment selector to index the TSS segment descriptor in the GDT, from which TSS is obtained.

What is TSS after all this talk? TSS(Task State SegmentTask\ State\ SegmentTask State Segment), which is the system data structure supported by hardware, SS and ESP of all levels (including the kernel) stack. So the SS and ESP of the kernel stack are fetched from here when the privilege level changes. This TSS here we are just a brief introduction, TSS what looks like, how to initialize, also some of the use, its functions are used? This is not the focus of this article and will be covered later in the process.

Next, switch to the second question, how to push the old stack information into the new stack, in fact, this problem is very simple, I first save the old stack information in a place, after the change of the stack then push into the line. This is true for iRetiret when it comes to popping information on the stack, as can be found in the Intel Intel Manual Volume 2. Indeed, it is handled in this way. The pseudo-code in the manual clearly indicates that there is tempTempTemp as a staging point. I don’t know what this tempTempTemp is, it’s not specified in the manual, maybe it’s another register, right? I don’t know, I don’t think it’s important to go that far.

There’s one more thing worth talking about in this article, and it’s not much to talk about, which is to explain. Pushal popalpushal popalpushal Popalpushal Popal is used to push and pop the 8 general purpose registers. Remember the comment about ESP in the interrupt stack frame structure? It says useless ignoreuseless\ ignoreignore, which means useless ignore. Why is that?

Pushalpushalpushal, pushalpushalpushal, pushl esppusHL \ esppusHL ESP. The value of ESP is stored in tempTempTemp before pusHALpushalpushal. Push temppush\ temppush temp is executed when the ESP is pressed.

Therefore, popALPopAlPopAl does not pop the value of tempTempTemp into ESP. Instead, it increases the value of ESP by 4 and skips tempTempTemp. Because if you jump tempTempTemp into ESP, you jump a stack, which is 4 bytes, and you skip a lot of bytes.

Here’s a picture, where red lines cross to indicate an error:

About pushal popalpushal popalpushal, popal pseudo code is as follows:

Interrupt a piece about the stack is so many, interrupted from time to tome privilege level changes in the stack, the kernel stack address to find in the TSS, interrupt will be after the completion of all the register information recovery, including pressure into the first in the interrupt SS ESP (privileged class changes), the stack is back to user mode stack. Of course, if the interrupt occurs in kernel state, the stack does not change, of course, this is only xv6XV6xv6 processing, other systems may be different, but in general interrupt processing is such a process.

Of course this is just a common peripherals trigger interrupt, some special interrupts, the content of the interrupt nesting switch interrupt all have no, interrupt is a great concept, content is very complex, this article will use xv6xv6xv6 triggered a common peripheral interrupt processing mechanism that should be very clear, well here are presented in this paper, If there is any mistake, please criticize and correct it. You are also welcome to communicate with me about learning progress.

Wiki.osdev.org/APIC#Local_…

wiki.osdev.org/IOAPIC

Blog.chinaunix.net/uid-2049974…