The mapping process of Linux memory management page table

Reprinted from:

Keywords: swapper_pd_dir, ARM PGD/PTE, Linux PGD/PTE, pgd_offset_k.


The page table mapping under Linux is divided into two types, one is the page table mapping of Linux itself, and the other is the mapping of the ARM32 MMU hardware.


1. ARM32 page table mapping

Since the page table entries maintained by the ARM32 and Linux kernels are different, two sets of PTE s are maintained.

PGD ​​is stored in swapper_pd_dir. A PGD directory entry actually contains two copies of ARM32 PGD.

Therefore, when re-allocating PTEs, a total of 1024 PTEs are allocated, 512 for Linux OS maintenance and 512 for ARM32 MMU, corresponding to the number of page tables of two PGD s.

Since the PTE of Linux OS and ARM32 are close to each other, the conversion of the two is also convenient.

1.1 ARM32 processor query page table

32-bit Linux uses three-level mapping: PGD-->PMD-->PTE, 64-bit Linux uses four-level mapping: PGD-->PUD-->PMD-->PTE, with one more PUD.

The abbreviations are PGD: Page Global Directory, PUD: Page Upper Directory, PMD: Page Middle Directory, PTE: Page Table Entry.

Two-level mapping is used in ARM32 Linux, PMD is omitted, and three-level mapping is used unless CONFIG_ARM_LPAE is defined.


In the ARM32 architecture, it can be mapped by section, which is a single-layer mapping mode.

Using page maps requires a two-layer map structure, and pages can be 64KB or 4KB in size.

1.1.1 ARM32 architecture MMU4KB page mapping process

If the page table mapping method is used, the segment mapping table becomes a first-level mapping table (called PGD in Linux), and its page table entry provides no longer a physical address, but the base address of the second-level page table.

The upper 12 bits (bit[31:20]) of the 32-bit virtual address are used as the index value for accessing the first-level page table, and the corresponding table entry is found, and each table entry points to a second-level page table.

Use the next 8 bits (bit[19:12]) of the virtual address as the index value for accessing the secondary page table, obtain the corresponding page table entry, and find the 20-bit physical page address from this page table entry.

Finally, the 20-bit physical page address and the lower 12 bits of the virtual address are pieced together to obtain the final 32-bit physical address.

This process is completed by the MMU hardware in the ARM32 architecture, and the software does not need to be connected.


ARM32 architecture MMU page table mapping process


1.1.2 Overview of Short Descriptor mapping in ARMv7-AR

The mapping process of the 4K page table is described in the ARMv7-AR User Architecture Manual.

An overview of the address mapping, the 32-bit virtual address finds the First-level table address from TTBR1, and then takes the virtual address VA[31:20] as the serial number to find the Second-level table address.

Take the virtual address VA[19:12] as the serial number to find the Page address.

Small Page mapping process in the specification

Figure B3-11 Small page address translation is the details of the mapping:


1.2 Linux page table mapping related data structures

We know that the page table mapping is created using create_mapping() in map_lowmem(). The parameter structure of this function is struct map_desc.

Let's study its related structure to help understand how the kernel handles page table mapping.


struct map_desc {
    unsigned long virtual;------virtual address start address
    unsigned long pfn;----------Physical address starting page frame number
    unsigned long length;-------memory space size
    unsigned int type;----------mem_types serial number in


The type in map_desc points to an array of mem_types of type struct mem_type:

struct mem_type {
    pteval_t prot_pte;------------PTE Attributes
    pteval_t prot_pte_s2;---------definition CONFIG_ARM_LPAE only effective
    pmdval_t prot_l1;-------------PMD Attributes
    pmdval_t prot_sect;-----------Section typemap
    unsigned int domain;----------definition ARM different domains in

static struct mem_type mem_types[] = {
    [MT_MEMORY_RWX] = {
        .prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY,----------------------Note that here are L_PTE_*type, which needs to be written MMU correspond PTE when converting.
        .prot_l1   = PMD_TYPE_TABLE,
        .prot_sect = PMD_TYPE_SECT | PMD_SECT_AP_WRITE,
        .domain    = DOMAIN_KERNEL,
    [MT_MEMORY_RW] = {
        .prot_pte  = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY |
        .prot_l1   = PMD_TYPE_TABLE,
        .prot_sect = PMD_TYPE_SECT | PMD_SECT_AP_WRITE,
        .domain    = DOMAIN_KERNEL,


The following focuses on the details of the first-level page table and the second-level page table of the Page Table type, as well as the definitions in the Linux kernel:

PGD ​​definition in ARM32

The following is a detailed description of the First-level descriptor:

 * Hardware page table definitions.
 * + Level 1 descriptor (PMD)
 *   - common
#define PMD_TYPE_MASK (_AT(pmdval_t, 3) << 0)-----------------------------01 corresponds to PageTable
#define PMD_TYPE_FAULT        (_AT(pmdval_t, 0) << 0)
#define PMD_TYPE_TABLE        (_AT(pmdval_t, 1) << 0)
#define PMD_TYPE_SECT        (_AT(pmdval_t, 2) << 0)
#define PMD_PXNTABLE        (_AT(pmdval_t, 1) << 2)     /* v7 */
#define PMD_BIT4        (_AT(pmdval_t, 1) << 4)
#define PMD_DOMAIN(x)        (_AT(pmdval_t, (x)) << 5)
#define PMD_PROTECTION        (_AT(pmdval_t, 1) << 9)        /* v5 */

Definition of PTE in ARM32

The following is a detailed description of the Second-level descriptor:

 * + Level 2 descriptor (PTE)
 *   - common
#define PTE_TYPE_MASK        (_AT(pteval_t, 3) << 0)
#define PTE_TYPE_FAULT        (_AT(pteval_t, 0) << 0)
#define PTE_TYPE_LARGE        (_AT(pteval_t, 1) << 0)
#define PTE_TYPE_SMALL        (_AT(pteval_t, 2) << 0)
#define PTE_TYPE_EXT        (_AT(pteval_t, 3) << 0)        /* v5 */
#define PTE_BUFFERABLE        (_AT(pteval_t, 1) << 2)
#define PTE_CACHEABLE        (_AT(pteval_t, 1) << 3)

 *   - extended small page/tiny page
#define PTE_EXT_XN        (_AT(pteval_t, 1) << 0)        /* v6 */
#define PTE_EXT_AP_MASK        (_AT(pteval_t, 3) << 4)
#define PTE_EXT_AP0        (_AT(pteval_t, 1) << 4)
#define PTE_EXT_AP1        (_AT(pteval_t, 2) << 4)
#define PTE_EXT_AP_UNO_SRO    (_AT(pteval_t, 0) << 4)
#define PTE_EXT_TEX(x)        (_AT(pteval_t, (x)) << 6)    /* v5 */
#define PTE_EXT_APX        (_AT(pteval_t, 1) << 9)        /* v6 */
#define PTE_EXT_COHERENT    (_AT(pteval_t, 1) << 9)        /* XScale3 */
#define PTE_EXT_SHARED        (_AT(pteval_t, 1) << 10)    /* v6 */
#define PTE_EXT_NG        (_AT(pteval_t, 1) << 11)    /* v6 */

Definition of PTE in Linux

Since Linux's definition of PTE is inconsistent with ARM hardware, the following definitions at the beginning of L_ are all for Linux, and the beginning of L_MT is the memory type represented by bit[5:2].

 * "Linux" PTE definitions.
 * We keep two sets of PTEs - the hardware and the linux version.
 * This allows greater flexibility in the way we map the Linux bits
 * onto the hardware tables, and allows us to have YOUNG and DIRTY
 * bits.
 * The PTE table pointer refers to the hardware entries; the "Linux"
 * entries are stored 1024 bytes below.
#define L_PTE_VALID        (_AT(pteval_t, 1) << 0)        /* Valid */
#define L_PTE_PRESENT        (_AT(pteval_t, 1) << 0)
#define L_PTE_YOUNG        (_AT(pteval_t, 1) << 1)
#define L_PTE_DIRTY        (_AT(pteval_t, 1) << 6)
#define L_PTE_RDONLY        (_AT(pteval_t, 1) << 7)
#define L_PTE_USER        (_AT(pteval_t, 1) << 8)
#define L_PTE_XN        (_AT(pteval_t, 1) << 9)
#define L_PTE_SHARED        (_AT(pteval_t, 1) << 10)    /* shared(v6), coherent(xsc3) */
#define L_PTE_NONE        (_AT(pteval_t, 1) << 11)

 * These are the memory types, defined to be compatible with
 * pre-ARMv6 CPUs cacheable and bufferable bits:   XXCB
#define L_PTE_MT_UNCACHED    (_AT(pteval_t, 0x00) << 2)    /* 0000 */
#define L_PTE_MT_BUFFERABLE    (_AT(pteval_t, 0x01) << 2)    /* 0001 */
#define L_PTE_MT_WRITETHROUGH    (_AT(pteval_t, 0x02) << 2)    /* 0010 */
#define L_PTE_MT_WRITEBACK    (_AT(pteval_t, 0x03) << 2)    /* 0011 */
#define L_PTE_MT_MINICACHE    (_AT(pteval_t, 0x06) << 2)    /* 0110 (sa1100, xscale) */
#define L_PTE_MT_WRITEALLOC    (_AT(pteval_t, 0x07) << 2)    /* 0111 */
#define L_PTE_MT_DEV_SHARED    (_AT(pteval_t, 0x04) << 2)    /* 0100 */
#define L_PTE_MT_DEV_NONSHARED    (_AT(pteval_t, 0x0c) << 2)    /* 1100 */
#define L_PTE_MT_DEV_WC        (_AT(pteval_t, 0x09) << 2)    /* 1001 */
#define L_PTE_MT_DEV_CACHED    (_AT(pteval_t, 0x0b) << 2)    /* 1011 */
#define L_PTE_MT_VECTORS    (_AT(pteval_t, 0x0f) << 2)    /* 1111 */
#define L_PTE_MT_MASK        (_AT(pteval_t, 0x0f) << 2)


ARM PMD descriptor bits[8:5] are used to describe Domain, but ARM Linux only defines three:

#define DOMAIN_KERNEL 2--------- for kernel space
#define DOMAIN_TABLE    2
#define DOMAIN_USER 1 ------------- for user space
#define DOMAIN_IO 0------------- for I/O address domain


1.3 Set the PGD page directory

The parameter of create_mapping is struct map_desc type, which is used to describe the linear mapping of a virtual address area to a physical area. Create PGD/PTE based on this area.

static void __init create_mapping(struct map_desc *md)
    unsigned long addr, length, end;
    phys_addr_t phys;
    const struct mem_type *type;
    pgd_t *pgd;
    type = &mem_types[md->type];------------------------------find the corresponding struct mem_type
    addr = md->virtual & PAGE_MASK;---------------------------Align to page
    phys = __pfn_to_phys(md->pfn);----------------------------page to physical address translation
    length = PAGE_ALIGN(md->length + (md->virtual & ~PAGE_MASK));
pgd = pgd_offset_k(addr);---------------------------------according to addr Find the corresponding virtual address pgd address end = addr + length; do { unsigned long next = pgd_addr_end(addr, end); alloc_init_pud(pgd, addr, next, phys, type);----------Initialize the next level page table phys += next - addr; addr = next; } while (pgd++, addr != end);-----------------------------Traverse the interval address, the step size is PGDIR_SIZE,i.e. 2 MB size of space. }

There are three places to explain here:


Convert the virtual address to get the PMD pointer.

#define PGDIR_SHIFT 21

/* to find an entry in a page-table-directory */
#define pgd_index(addr)    ((addr) >> PGDIR_SHIFT)

#define pgd_offset(mm, addr)    ((mm)->pgd + pgd_index(addr))

#define pgd_offset_k(addr)    pgd_offset(&init_mm, addr)

struct mm_struct init_mm = {
    .mm_rb        = RB_ROOT,
    .pgd        = swapper_pg_dir,
    .mm_users    = ATOMIC_INIT(2),
    .mm_count    = ATOMIC_INIT(1),
    .mmap_sem    = __RWSEM_INITIALIZER(init_mm.mmap_sem),
    .page_table_lock =  __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock),
    .mmlist        = LIST_HEAD_INIT(init_mm.mmlist),


From the virtual memory layout map swapper_pg_dir It can be seen that the size is 16KB, and there are detailed explanations in it. init_mm.pgd points to swapper_pg_dir.




#define pgd_addr_end(addr, end)                        \
({    unsigned long __boundary = ((addr) + PGDIR_SIZE) & PGDIR_MASK;    \
    (__boundary - 1 < (end) - 1)? __boundary: (end);        \

 * PMD_SHIFT determines the size of the area a second-level page table can map
 * PGDIR_SHIFT determines what a third-level page table entry can map
#define PMD_SHIFT        21
#define PGDIR_SHIFT        21

#define PMD_SIZE        (1UL << PMD_SHIFT)
#define PMD_MASK        (~(PMD_SIZE-1))
#define PGDIR_SIZE        (1UL << PGDIR_SHIFT)
#define PGDIR_MASK        (~(PGDIR_SIZE-1))


Since PGDIR_SHIFT is 21, a PGD page table directory corresponds to 2MB of space, ie [addr, addr+PGDIR_SIZE). So the number of PGD is 2^11, 2028. The entire PGD page table occupies a space of 2048*4B=8KB.

This is inconsistent with the 4096 PGD of ARM hardware. This involves Linux implementation skills, which are analyzed in the creation of PTE.

Therefore, according to the 2MB step size, traverse the [virtual, virtual+length) space to create the PDG page table and PTE.


Since ARM-Linux uses two-level page table mapping, it skips PUD/PMD and goes directly to alloc_init_pte to create PTE.


static void __init alloc_init_pte(pmd_t *pmd, unsigned long addr,------------here pmd=pud=pgd. 
                  unsigned long end, unsigned long pfn,
                  const struct mem_type *type)
    pte_t *pte = early_pte_alloc(pmd, addr, type->prot_l1);------------------use prot_l1 As a parameter, create PGD page table directory,return addr corresponding pte address.
    do {
        set_pte_ext(pte, pfn_pte(pfn, __pgprot(type->prot_pte)), 0);---------Call architecture-dependent assembly, configuration PTE. 
    } while (pte++, addr += PAGE_SIZE, addr != end);-------------------------traverse[addr, end)interval memory, with PAGE_SIZE is the step length.

Let's see how to allocate the PGD page table directory:

static pte_t * __init early_pte_alloc(pmd_t *pmd, unsigned long addr, unsigned long prot)
    if (pmd_none(*pmd)) {---------------------------------------------------if PGD The content is empty, i.e. PTE If you haven't created it yet, choose to create a page.
        pte_t *pte = early_alloc(PTE_HWTABLE_OFF + PTE_HWTABLE_SIZE);-------Assign 512+512 indivual PTE page table entry
        __pmd_populate(pmd, __pa(pte), prot);-------------------------------generate pmd page table directory, and flush RAM
    return pte_offset_kernel(pmd, addr);------------------------------------return current addr corresponding PTE address

static void __init *early_alloc_aligned(unsigned long sz, unsigned long align)
    void *ptr = __va(memblock_alloc(sz, align));-------------------------------------based on memblock Make an allocation, here 4096 is allocated B,Exactly the size of a page.
    memset(ptr, 0, sz);
    return ptr;

Therefore, the space required to store PGD is applied for through memblock, and both PTE_HWTABLE_OFF and PTE_HWTABLE_SIZE are 512, so a 1024 PTE.

The following is a schematic diagram of the space allocated by early_pte_alloc: the first 512 entries are for the Linux OS, and the last 512 entries are for the ARM hardware MMU.





Linux kernel PGD/PTE mapping relationship



static inline void __pmd_populate(pmd_t *pmdp, phys_addr_t pte,
                  pmdval_t prot)
    pmdval_t pmdval = (pte + PTE_HWTABLE_OFF) | prot;------------------generate pmdp[0]Content
    pmdp[0] = __pmd(pmdval);
    pmdp[1] = __pmd(pmdval + 256 * sizeof(pte_t));---------------------generate adjacent pmdp[1]Content
    flush_pmd_entry(pmdp);---------------------------------------------Will pmdp Two brushed into RAM middle


The PGD page table directory of Linux is different from that of ARM32, and the total number is the same as that of ARM32.


In arm_mm_memblock_reserve, pass swapper_pg_dir It can be known that its size is 16KB.

Let's take a look at SWAPPER_PG_DIR_SIZE, there are 2048 PGDs in total, but each PGD contains two adjacent PGD page directory entries.

typedef pmdval_t pgd_t[2];---------------------------------------------8 byte

#define SWAPPER_PG_DIR_SIZE (PTRS_PER_PGD * sizeof(pgd_t))-------------2048*8B=16KB

/* * Reserve the special regions of memory */ void __init arm_mm_memblock_reserve(void) { /* * Reserve the page tables. These are already in use, * and can only be in node 0. */ memblock_reserve(__pa(swapper_pg_dir), SWAPPER_PG_DIR_SIZE); ... }


1.4 Setting PTE entry

To understand how to set the PTE entry, you need to refer to the description of Second-level descriptors in B3.3.1 Translation table entry formsants.




#define set_pte_ext(ptep,pte,ext) cpu_set_pte_ext(ptep,pte,ext)

#ifndef MULTI_CPU
#define cpu_set_pte_ext            __glue(CPU_NAME,_set_pte_ext)
#endif arch\arm\mm\proc-v7-2level.S: /* * cpu_v7_set_pte_ext(ptep, pte) * * Set a level 2 translation table entry. * * - ptep - pointer to level 2 translation table entry----------put in r0 * (hardware version is stored at +2048 bytes) * - pte - PTE value to store----------------------------------put in r1 * - ext - value for extended PTE bits------------------------put in r2 */ ENTRY(cpu_v7_set_pte_ext) #ifdef CONFIG_MMU str r1, [r0] @ linux version----------Will r1 The value is stored in r0 address in memory bic r3, r1, #0x000003f0----------------------------Clear bit[9:4] of r1 and store in r3 bic r3, r3, #PTE_TYPE_MASK-------------PTE_TYPE_MASK is 0x03, remember to clear the lower 2 bits orr r3, r3, r2-----------------------------------r3 and r2 or, deposit r3 orr r3, r3, #PTE_EXT_AP0 | 2--------------------- here bit1 and bit4 are set, so it is a Small page. tst r1, #1 << 4----------------------------------Determine whether bit4 of r1 is 0 orrne r3, r3, #PTE_EXT_TEX(1)--------------------Set TEX to 1 eor r1, r1, #L_PTE_DIRTY tst r1, #L_PTE_RDONLY | L_PTE_DIRTY orrne r3, r3, #PTE_EXT_APX-----------------------Set AP[2] tst r1, #L_PTE_USER orrne r3, r3, #PTE_EXT_AP1-------------Set AP[1] tst r1, #L_PTE_XN orrne r3, r3, #PTE_EXT_XN--------------set the XN bit tst r1, #L_PTE_YOUNG tstne r1, #L_PTE_VALID eorne r1, r1, #L_PTE_NONE tstne r1, #L_PTE_NONE moveq r3, #0 ARM( str r3, [r0, #2048]! )--------------------- does not write r0, but the offset of r0+2048Bytes. THUMB( add r0, r0, #2048 ) THUMB( str r3, [r0] ) ALT_SMP(W(nop)) ALT_UP (mcr p15, 0, r0, c7, c10, 1) @ flush_pte #endif bx lr ENDPROC(cpu_v7_set_pte_ext)

Posted by hykc on Tue, 15 Nov 2022 00:32:44 +1030