Osdev-Notes

APIC

What is APIC

APIC stands for Advanced Programmable Interrupt Controller, and it’s the device used to manage incoming interrupts to a processor core. It replaces the old PIC8259 (that remains still available), and it offers more functionalities, especially when dealing with SMP. In fact one of the limitations of the PIC was that it was able to deal with only one cpu at time, and this is also the main reason why the APIC was introduced.

It’s worth noting that Intel later developed a version of the APIC called the SAPIC for the Itanium platform. These are referred to collectively as the xapic, so if this term is used in documentation know that it just means the local APIC.

Types of APIC

There are two types of APIC:

Both types of APIC are accessed by memory mapped registers, with 32-bit wide registers. They both have well-known base addresses, but rather than hardcoding these they should be fetched from the proper places as firmware (or even the bootloader) may move these around before our kernel boots.

Local APIC

When a system boots up, the cpu starts in PIC8259A emulation mode for legacy reasons. This simply means that instead of having the LAPIC and I/O APIC up and running, we have them working to emulate the old interrupt controller, so before we can use them properly we should to disable the PIC8259 emulation.

Disabling The PIC8259

This part should be pretty straightforward, and we will not go deep into explaining the meaning of all command sent to it. The sequence of commands is:

void disable_pic() {
    outportb(PIC_COMMAND_MASTER, ICW_1);
    outportb(PIC_COMMAND_SLAVE, ICW_1);
    outportb(PIC_DATA_MASTER, ICW_2_M);
    outportb(PIC_DATA_SLAVE, ICW_2_S);
    outportb(PIC_DATA_MASTER, ICW_3_M);
    outportb(PIC_DATA_SLAVE, ICW_3_S);
    outportb(PIC_DATA_MASTER, ICW_4);
    outportb(PIC_DATA_SLAVE, ICW_4);
    outportb(PIC_DATA_MASTER, 0xFF);
    outportb(PIC_DATA_SLAVE, 0xFF);
}

The old x86 architecture had two PIC processor, and they were called “master” and “slave”, and each of them has its own data port and command port:

The ICW values are initialization commands (ICW stands for Initialization Command Words), every command word is one byte, and their meaning is:

Discovering the Local APIC

The first step needed to configure the LAPIC is getting access to it. The APIC registers are memory mapped, and to get their location we need to read the MSR (model specific register) that contains its base address, using the rdmsr instruction. This instruction reads the content of the MSR specified in ecx, the result is placed in eax and edx (with eax containing the lower 32-bits, and edx container the upper 32-bits).

In our case the MSR that we need to read is called IA32_APIC_BASE and its value is 0x1B.

This register contains the following information:

Note that the registers are given as a physical address, so to access these we will need to map them somewhere in the virtual address space. This is true for the addresses of any I/O APICs we obtain as well. When the system boots, the base address is usually 0xFEE0000 and often this is the value we read from rdmsr.

A complete list of local APIC registers is available in the Intel/AMD software development manuals, but the important ones for now are:

Enabling the Local APIC, and The Spurious Vector

The spurious vector register also contains some miscellaneous config for the local APIC, including the enable/disable flag. This register has the following format:

Bits Value
0-7 Spurious vector
8 APIC Software enable/disable
9 Focus Processor checking
10-31 Reserved

The functions of the fields in the registers are as follows:

The Spurious Vector register is writable only in the first 9 bits, the rest is read only. In order to enable the LAPIC we need to set bit 8, and set-up a spurious vector entry for the idt. In modern processors the spurious vector can be any vector, however old CPUs have the upper 4 bits of the spurious vector forced to 1, meaning that the vector must be between 0xF0 and 0xFF. For compatibility it’s best to place the spurious vector in that range. Of course we need to set-up the corresponding idt entry with a function to handle it, but for now printing an error message is enough.

Reading APIC Id and Version

The ID register contains the physical id of the local APIC in the system. This is unique and assigned by the firmware when the system is first started. Often this ID is used to distinguish each processor from the others due to them being unique. This register is allowed to be read/write in some processors, but it’s recommended to treat it as read-only.

The version register contains some useful (if not really needed) information. Exploring this register is left as an exercise to the reader.

Local Vector Table

The local vector table allows the software to specify how the local interrupts are delivered. There are 6 items in the LVT starting from offset 0x320 to 0x370:

The LINT0 and LINT1 pins are mostly used for emulating the legacy PIC, but they may also be used as NMI sources. These are best left untouched until we have parsed the MADT, which will tell how the LVT for these pins should be programmed.

Most LVT entries use the following format, with the timer LVT being the notable exception. It’s format is explained in the timers chapter. The thermal sensor and performance entries ignore bits 15:13.

Bit Description
0:7 Interrupt Vector. This is the IDT entry we want to trigger when for this interrupt.
8:10 Delivery mode (see below)
11 Destination Mode, can be either physical or logical.
12 Delivery Status (Read Only), whether the interrupt has been served or not.
13 Pin polarity: 0 is active-high, 1 is active-low.
14 Remote IRR (Read Only) used by the APIC for managing level-triggered interrupts.
15 Trigger mode: 0 is edge-triggered, 1 is level-triggered.
16 Interrupt mask, if it is 1 the interrupt is disabled, if 0 is enabled.

The delivery mode field determines how the the APIC should present the interrupt to the processor. The fixed mode (0b000) is fine in almost all cases, the other modes are for specific functions or advanced usage.

X2 APIC

The X2APIC is an extension of the XAPIC (the local APIC in its regular mode). The main difference is the registers are now accessed via MSRs and some the ID register is expanded to use a 32-bit value (previously 8-bits). While we’re going to look at how to use this mode, it’s perfectly fine to not support it.

Checking whether the current processor supports the X2APIC or not can be done via cpuid. It will be under leaf 1, bit 21 in ecx. If this bit is set, the processor supports the X2APIC.

Enabling the X2APIC is done by setting bit 10 in the IA32_APIC_BASE MSR. It’s important to note that once this bit is set, we cannot clear it to transition back to the regular APIC operation without resetting the system.

Once enabled, the local APIC registers are no longer memory mapped (trying to access them there is now an error) and can instead be accessed as a range of MSRs starting at 0x800. Since each MSR is 64-bits wide, the offset used to access an APIC register is shifted right by 4 bits.

As an example, the spurious interrupt register is offset 0xF0. To access the MSR version of this register we would shift it right by 4 (0xF0 >> 4 = 0xF) and then add the base offset (0x800) to get the MSR we want. That means the spurious interrupt register is MSR 0x80F.

Since MSRs are 64-bits, the upper 32 bits are zero on reads and ignored on writes. As always there is an exception to this, which is the ICR register (used for sending IPIs to other cores) which is now a single 64-bit register.

Handling Interrupts

Once an interrupt for the local APIC is served, it won’t send any further interrupts until the end of interrupt signal is sent. To do this write a 0 to the EOI register, and the local APIC will resume sending interrupts to the processor. This is a separate mechanism to the interrupt flag (IF), which also disables interrupts being served to the processor. It is possible to send EOI to the local APIC while IF is cleared (disabling interrupts) and no further interrupts will be served until IF is set again.

There are few exceptions where sending an EOI is not needed, this is mainly spurious interrupts and NMIs.

The EOI can be sent at any time when handling an interrupt, but it’s important to do it before returning with iret. If we enable interrupts and only receive a single interrupt, forgetting to send EOI may be the reason.

Sending An Inter-Processor Interrupt

If we want to support symmetric multiprocessing (SMP) in our kernel, we need to inform other cores that an event has occurred. This is typically done by sending an inter-processor interrupt (IPI). Note that IPIs don’t carry any information about what event occurred, they simply indicate that something has happened. To send data about what the event is a struct is usually placed in memory somewhere, sometimes called a mailbox.

To send an IPI we need to know the local APIC ID of the core we wish to interrupt. We will also need a vector in the IDT set up for handling IPIs. With these two things we can use the ICR (interrupt command register).

The ICR is 64-bits wide and therefore we access it as two registers (a higher and lower half). The IPI is sent when the lower register is written to, so we should set up the destination in the higher half first, before writing the vector in the lower half.

This register contains a few fields but most can be safely ignored and left to zero. We’re interested in bits 63:56 which is the ID of the target local APIC (in X2APIC mode it is bits 63:32) and bits 7:0 which contain the interrupt vector that will be served on the target core.

An example function might look like the following:

void lapic_send_ipi(uint32_t dest_id, uint8_t vector) {
    lapic_write_reg(ICR_HIGH, dest_id << 24);
    lapic_write_reg(ICR_LOW, vector);
}

At this point the target core would receive an interrupt with the vector we specified (assuming that core is setup correctly).

There is also a shorthand field in the ICR which overrides the destination id. It’s available in bits 19:18 and has the following definition:

I/O APIC

The I/O APIC primary function is to receive external interrupt events from the systems, and is associated with I/O devices, and relay them to the local APIC as interrupt messages. With the exception of the LAPIC timer, all external devices are going to use the IRQs provided by it (like it was done in the past by the PIC).

Configure the I/O APIC

To configure the I/O APIC we need to:

  1. Get the I/O APIC base address from the MADT
  2. Read the I/O APIC Interrupt Source Override table
  3. Initialize the IO Redirection table entries for the interrupt we want to enable

Getting the I/O APIC address

Read I/O APIC information from the MADT (the MADT is available within the RSDT data, we need to search for the MADT item type 1). The contents of the MADT for the I/O APIC type are:

Offset Length Description
2 1 I/O APIC ID
3 1 Reserved (should be 0)
4 4 I/O APIC Address
8 4 Global System Interrupt Base

The I/O APIC ID field is mostly fluff, as we’ll be accessing the I/O APIC by its MMIO address, not its ID.

The Global System Interrupt Base is the first interrupt number that the I/O APIC handles. In the case of most systems, with only a single I/O APIC, this will be 0.

To check the number of inputs an I/O APIC supports:

uint32_t ioapicver = read_ioapic_register(IOAPICVER);
size_t number_of_inputs = ((ioapicver >> 16) & 0xFF) + 1;

The number of inputs is encoded as bits 23:16 of the IOAPICVER register, minus one.

I/O APIC Registers

The I/O APIC has 2 memory mapped registers for accessing the other I/O APIC registers:

Memory Address Mnemonic Name Register Name Description
FEC0 0000h IOREGSEL I/O Register Select Is used to select the I/O Register to access
FEC0 0010h IOWIN I/O Window (data) Used to access data selected by IOREGSEL

And then there are 4 I/O Registers that can be accessed using the two above:

Name Offset Description Attribute
IOAPICID 00h Identification register for the I/O APIC R/W
IOAPICVER 01h I/O APIC Version RO
IOAPICARB 02h It contains the BUS arbitration priority for the I/O APIC RO
IOREDTBL 03h-3fh The redirection tables (see the IOREDTBL paragraph) RW

Reading data from I/O APIC

There are basically two addresses that we need to use in order to write/read data from apic registers and they are:

The format of the IOREGSEL is:

Bit Description
31:8 Reserved
7:0 APIC Register Address, they specifies the I/O APIC Registers to be read or written via the IOWIN Register

So basically if we want to read/write a register of the I/O APIC we need to:

  1. write the register index in the IOREGSEL register
  2. read/write the content of the register selected in IOWIN register

The actual read or write operation is performed when IOWIN is accessed. Accessing IOREGSEL has no side effects.

Interrupt source overrides

They contain differences between the IA-PC standard and the dual 8250 interrupt definitions. The isa interrupts should be identity mapped into the first I/O APIC sources, but most of the time there will be at least one exception. This table contains those exceptions.

An example is the PIT Timer is connected to ISA IRQ 0, but when apic is enabled it is connected to the I/O APIC interrupt input pin 2, so in this case we need an interrupt source override where the Source entry (bus source) is 0 and the global system interrupt is 2 The values stored in the I/O APIC Interrupt source overrides in the MADT are:

Offset Length Description
2 1 bus source (it should be 0)
3 1 irq source
4 4 Global System Interrupt
8 2 Flags

Flags are defined as follows:

IO Redirection Table (IOREDTBL)

They can be accessed via memory-mapped registers. Each entry is composed of 2 registers (starting from offset 10h). So for example the first entry will be composed by registers 10h and 11h.

The content of each entry is:

The number of items is stored in the I/O APIC MADT entry, but usually on modern architectures is 24.