This article is originally published. Please note: www.cnblogs.com/tolimit/

 

Recently in the study of kernel module framework, here to make a summary, knowledge too much.

 

Segmentation and paging

Let’s start with a picture

That is, the memory address we encounter in actual coding does not correspond to the actual memory address, the address we use in coding is a logical address, it will be segmented and paginated into a physical address. Due to the limited segmentation mechanism used by Linux, it can be considered that logical address = linear address under Linux. That is, we code with a linear address, and then we just need to go through a pagination mechanism to convert that address to a physical address. So it’s probably more important for us to explain Linux’s paging model.

 

The system divides the entire physical memory into multiple pagers, each of which is typically 4K(or 4M in the case of hardware allowed extended paging (PSE), but Linux does not use PSE and may use PAE), meaning that if we have 1GB of physical memory, the system will divide the physical memory into 262144 pagers. When we provide a linear address, the system translates the linear address to a memory address corresponding to a physical page through paging. Here is the paging model for Linux

Linux uses the four-level paging model. The four page tables are: page global Directory (PGD), upper page directory (PUD), mid-page directory (PMD), and page table (PTE). All of the global page directories, parent page directories, mid-page directories, and page tables here are the size of one page. The level 4 directory is not always used on all hardware under Linux. When used on 32-bit systems without physical address Extension (PAE) enabled, only level 2 page tables are used, and Linux empties the upper and middle directories of the page. On 32-bit systems with physical address extension enabled, Linux uses a tertiary page table, with the parent directory of the page empty. On 64-bit systems, Linux will choose either tertiary or quaternary page tables, depending on the hardware. This whole conversion from linear address to physical address is done automatically by the CPU.

Each process has its own page global directory. When the process is running, the system will save the page global directory base address of the process in the CR3 register. When the process is paged out, the page global directory address saved by CR3 is stored in the process descriptor. We will also introduce a CR2 register for page – missing exception handling later. When a process is running, it uses its own set of page tables, and when it is in kernel mode or through system calls, it uses the kernel page table. In fact, for all process page tables, their linear address above 0xC0000000 refers to the page table as the main kernel page global directory (stored in init_mm.pgd). Their contents are equal to the corresponding entries in the main kernel page global directory, thus enabling the situation where the process space of all processes is isolated from each other, but the kernel space is shared. When modified some of the kernel page tables of a process map, system will only modify the kernel page of the corresponding global directory table item (only modify high memory contiguous area mapping), when the other processes to access these linear addresses, there will be a page fault is unusual, and then modify the process of page table entries to map the address.

Since each process has its own page global directory, if you had 100 processes, you would have to hold the entire page table set of 100 processes in memory, which would seem to consume quite a bit of memory. In fact, the system will only allocate a path to the process if it is used by the process. For example, we want to access a linear address, but the address may not correspond to the upper page directory, middle page directory, page table, and page. In this case, the system will generate a page missing exception. This linear address of the process is then assigned to the upper page directory, the middle page directory, the page table, and the physical page box required by the page.

 

 

Address space

A linear address is paged to a corresponding physical address, which we call a mapping. For example, if our linear address 0x00000001 is paged, the corresponding physical address might be 0xFFffff01.

In Linux system, there are two address Spaces, one is the process address space, the other is the kernel address space. Each process has its own 3G process address space, which is isolated from each other. The 0x00000001 linear address of process A is not the same as the 0x00000001 linear address of process B. Process A cannot directly access process B’s process address space through its own process space. When the linear address is greater than 3G (i.e. 0xC0000000), the linear address belongs to the kernel space, which is 1 gb in size and ranges from 0xC0000000 to 0xFFFFFFFF. In the kernel address space, the kernel will map the first 896MB of linear address directly to the first 896MB of physical address, that is, the linear address 0xC0000001 of kernel address space corresponds to the physical address 0x00000001, they are separated by one 0xC0000000.

The Linux kernel divides physical memory into three management areas:

  • ZONE_DMA: Contains memory pagesbetween 0MB and 16MB, which can be used by older ISa-based devices via DMA, mapped directly into the kernel’s address space.
  • ZONE_NORMAL: contains memory pagesbetween 16MB and 896MB. Regular pagesmap directly to the address space of the kernel.
  • ZONE_HIGHMEM: memory pagers larger than 896MB can be accessed through permanent or temporary mapping instead of direct mapping.

The entire structure is shown below

For both ZONE_DMA and ZONE_NORMAL, kernel addresses are mapped directly. Only the ZONE_HIGHMEM system does not map directly by default, but only when needed (temporary or permanent).

 

Node and management area descriptors

For use with the NUMA architecture, node is used to describe the memory of a place. For us PCS, a PC is a node. Struct pglist_data;


/* Memory node descriptor, Struct pglist_data *node_data[MAX_NUMNODES] */ typedef struct pglist_data {/* array of management area descriptors */ struct zone node_zones[MAX_NR_ZONES]; Struct zonelist node_zonelists[MAX_ZONELISTS]; struct zonelist node_zonelists[MAX_ZONELISTS]; /* int nr_zones; #ifdef CONFIG_FLAT_NODE_MEM_MAP /* means ! Struct page */ struct page */ struct page */ struct page */ struct page */ #ifdef CONFIG_MEMCG /* struct page_cgroup */ struct page_cgroup */ #endif #endif #ifndef CONFIG_NO_BOOTMEM */ struct bootmem_data */ #endif #ifdef CONFIG_MEMORY_HOTPLUG /* spinlock_t node_size_lock; #endif /* #endif /* #endif /* #endif /* #endif /* #endif /* #endif /* #endif /* #endif /* #endif /* #endif /* */ unsigned long node_start_pfn; */ unsigned long node_start_pfn; /* Memory node size, excluding holes (page frames) */ unsigned long node_present_pages; /* The size of the node, including holes (in page frames) */ unsigned long node_SPANned_pages; /* node_id */ int node_id; /* kswaped, the page off-type wait queue used by daemons */ wait_queue_head_t kswapd_wait; wait_queue_head_t pfmemalloc_wait; */ struct task_struct *kswapd; /* Protected by mem_hotplug_begin/end() */ /* kswapd to create a free block size of the logarithm */ int kswapd_max_order; enum zone_type classzone_idx; #ifdef CONFIG_NUMA_BALANCING /* * serializing the migrate rate limiting window */ numabalancing_migrate_lock; /* Rate limiting time interval */ unsigned long numabalancing_migrate_next_window; /* Number of pages migrated during the rate limiting time interval */ unsigned long numabalancing_migrate_nr_pages; #endif } pg_data_t;Copy the code


All node descriptors in the system are stored in the node_data array. In pg_data_t node descriptor, node_zones array contains all the management zone descriptors in this node. Although the physical memory is divided into three zones, the system is logically divided into four management zones, the extra one is ZONE_MOVABLE, which is a virtual management zone. It does not correspond to a region of memory, its main purpose is to avoid memory fragmentation, and its memory is either entirely from the ZONE_HIGHMEM or ZONE_NORMAL region. We will see this later in the initialization function.

Each node has a kernel thread, KSWAPD, whose job is to swap uncommon pages held by the process or kernel to disk to free up more available memory.

Let’s look at the administrative area descriptor:


/* struct zone {/* Read-mostly fields */ Access with *_wmark_pages(zone) macros */ * Include pages_min,pages_low,pages_high * pages_min: The lower bound used by the reclaim page box, which is also used by the admin allocator as the threshold, is typically 5/4 of pages_min * pages_high: Unsigned long watermark[NR_WMARK]; unsigned long watermark[NR_WMARK]; unsigned long watermark[NR_WMARK]; */ long lowmem_reserve[MAX_NR_ZONES]; */ long lowmem_reserve[MAX_NR_ZONES]; #ifdef CONFIG_NUMA int node; #endif /* * The target ratio of ACTIVE_ANON to INACTIVE_ANON pages on * this zone's LRU. Maintained by the pageout code.  */ unsigned int inactive_ratio; Struct pglist_data *zone_pgdat; struct pglist_data *zone_pgdat; Struct per_cpu_pageset __perCPU *pageset; struct per_cpu_pageset __perCPU *pageset; /* * This is a per-zone reserve of pages that should not be * considered dirtyable memory. */ unsigned long dirty_balance_reserve; #ifndef CONFIG_SPARSEMEM /* * Flags for a pageblock_nr_pages block. See pageblock-flags.h. * In SPARSEMEM, this map is stored in struct mem_section */ unsigned long *pageblock_flags; #endif /* CONFIG_SPARSEMEM */ #ifdef CONFIG_NUMA /* * zone reclaim becomes active if more unmapped pages exist. */ unsigned long min_unmapped_pages; unsigned long min_slab_pages; #endif /* CONFIG_NUMA */ /* zone_start_pfn == zone_start_paddr >> PAGE_SHIFT */ /* The first page subscript of the management area */ unsigned long zone_start_pfn; /* All normally available pages, total pages (excluding holes) minus reserved pages */ unsigned long managed_pages; /* Total size of the management area (in pages), including holes */ unsigned long spanned_pages; /* Total size of the management area (pages in units), excluding holes */ unsigned long Present_pages; /* Const char *name, "DMA" "NORMAL" "HighMem" /* Number of page blocks corresponding to the MIGRATE_RESEVE chain in the partner system */ int nr_migrate_reserve_block; #ifdef CONFIG_MEMORY_ISOLATION /* * Number of isolated pageblock. It is used to solve incorrect * freepage counting Problem due to racy covered migrateType * of pageblock. Protected by zone->lock long nr_isolate_pageblock; #endif #ifdef CONFIG_MEMORY_HOTPLUG /* see spanned/present_pages for more description */ seqlock_t span_seqlock; #endif /* Hash table for waiting queues of processes on a page in the wait management area */ wait_queue_head_t *wait_table; /* Wait queue hash size */ unsigned long wait_table_hash_nr_entries; /* Wait queue hash array size */ unsigned long wait_table_bits; ZONE_PADDING(_PAD1_) /* write-intensive fields used from the page allocator */ * Spinlock_t lock to protect the descriptor */ /* Free areas of different sizes */ /* Identifies free page-box blocks in the admin area for partner systems */ /* MAX_ORDER is 11, Represent include size of 1,2,4,8,16,32,64,128,256,512,1024 consecutive frame list * / struct free_area free_area [MAX_ORDER]; /* unsigned long flags; /* unsigned long flags; ZONE_PADDING(_PAD2_) /* Fields identified by the Page Reclaim */ /* Active and inactive chain list use spinlock */ spinlock_t lru_lock; struct lruvec lruvec; /* Evictions & activations on the inactive file list */ atomic_long_t inactive_age; /* * When free pages are below this point, additional steps are taken * when reading the number of free pages to avoid per-cpu counter * drift allowing watermarks to be breached */ unsigned long percpu_drift_mark; #if defined CONFIG_COMPACTION || defined CONFIG_CMA /* pfn where compaction free scanner should start */ unsigned long compact_cached_free_pfn; /* pfn where async and sync compaction migration scanner should start */ unsigned long compact_cached_migrate_pfn[2]; #endif #ifdef CONFIG_COMPACTION /* * On compaction failure, 1<<compact_defer_shift compactions * are skipped before trying again. The number attempted since * last failure is tracked with compact_considered. */ unsigned int compact_considered; unsigned int compact_defer_shift; int compact_order_failed; #endif #if defined CONFIG_COMPACTION || defined CONFIG_CMA /* Set to true when the PG_migrate_skip bits should be cleared */ bool compact_blockskip_flush; #endif ZONE_PADDING(_pad3_) /* Some statistics for the admin area */ atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS]; } ____cacheline_internodealigned_in_smp;Copy the code


 

The management area descriptor actually stores all page frames belonging to the management area in two places: struct free_area free_area[MAX_ORDER] and struct per_cpu_pageset __perCPU * pageset. Free_area is the partner system for this administrative area, and Pageset is the per-CPU page-box cache for this area. An understanding of administrative areas requires a combination of partner systems and per-CPU page-frame caching

 

Admin area page frame allocator (Manages all physical memory page frames)

The ZONE_NORMAL and ZONE_DMA addresses map directly into the kernel address space, but that doesn’t mean kernel code can access physical addresses directly from linear addresses. The kernel manages all the page-frames in physical memory through a management area page-frame allocator, and the core system in the management area page-frame allocator is the partner system and per-CPU page-frame cache (not the hardware cache, just the same name). In Linux, the pager allocator manages all physical memory. If you are a kernel or a process and need some memory for yourself, you need to request the pager allocator to allocate the physical memory that you deserve. When you own page frames that are no longer in use, you must release them and return them to the admin area page frame allocator. In particular, for high-end memory, even if the corresponding page box is obtained from the administrative area page box allocator, we still need to map it to use it.

Sometimes the target management area does not have enough page boxes to meet the allocation. In this case, the system will obtain the required page boxes from the other two management areas, but this is done according to certain rules, as follows:

  • If you want to get it from the DMA region, you can only get it from the ZONE_DMA region.
  • If you do not specify which section to fetch from, the order is ZONE_NORMAL -> ZONE_DMA.
  • If it is specified to fetch from the HIGHMEM section, the order is ZONE_HIGHMEM -> ZONE_NORMAL -> ZONE_DMA.

Note that it is not allowed to get page frames from two different extents in one allocation, and when multiple page frames are requested, the page frames assigned to the target from the partner system are sequential and the requested page number must be a power of two.

The main thing the management area allocator does is to allocate page frames either through the partner system or through the per-CPU page frame cache. There are three structures involved: page descriptors, partner systems, and per-CPU cache.

Let’s talk about page descriptors first. Page descriptors are not really exclusive to page boxes, they are also used to describe a SLAB allocator and a SLUB allocator, but we’ll talk about pages first:


Struct page {/* First double word block */ * First double word block */ A set of flags (such as PG_locked, PG_error) that also number the management area and node where the page frame is located */ unsigned long flags; /* Atomic flags, some possibly * updated asynchronously */ union {/* Used for page descriptors when pages are inserted into the page cache, Or use */ struct address_space *mapping if the page is in an anonymous zone; /* for SLAB descriptor, used to execute the address of the first object */ void *s_mem; /* slab first object */ }; /* Second double word */ struct {union {/* used as a different meaning by several kernel components. For example, it identifies the location of data stored in a page box in a page disk image or anonymous area, or it stores a pout page identifier */ pgoff_t index; /* Our offset within mapping. */ /* Uses SLAB descriptor to refer to the first free object address */ void *freelist; */ bool pfmemalloc; /* Pfmemalloc; /* Pfmemalloc; /* pfmemalloc; }; Union {#if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE) /* SLUB uses */ unsigned long counters; #else /* SLUB uses */ unsigned counters; #endif struct {union {/* PAGE_BUDDY_MAPCOUNT_VALUE(-128), PAGE_BUDDY_MAPCOUNT_VALUE(-128), PAGE_BUDDY_MAPCOUNT_VALUE(-128) In use, it should be 0 */ atomic_t _mapcount; Struct {/* SLUB uses */ unsigned inuse:16; unsigned objects:15; unsigned frozen:1; }; int units; /* SLOB */ }; /* The reference count of the page box, if -1, the page box is free and can be allocated to any process or kernel; If greater than or equal to 0, the page box is assigned to one or more processes or is used to hold kernel data. Page_count () returns _count plus 1, which is the number of users on the page */ atomic_t _count; /* Usage count, see below. */ }; /* For SLAB descriptors */ unsigned int active; /* SLAB */ }; }; /* Third double word block */ union {/* contain a pointer to the least recently used (LRU) two-way list of the page, used to insert the free list of the partner system, only the head of the block is inserted */ struct list_head LRU; /* slub per CPU partial pages */ struct page */ next; /* Next partial slab */ #ifdef CONFIG_64BIT int pages; /* Nr of partial slabs left */ int pobjects; /* Approximate # of objects */ #else short int pages; short int pobjects; #endif }; struct slab *slab_page; /* slab fields */ struct rcu_head rcu_head; #if defined(CONFIG_TRANSPARENT_HUGEPAGE) && USE_SPLIT_PMD_PTLOCKS pgtable_t pmd_huge_pte; /* protected by page->ptl */ #endif }; /* Remainder is not double word aligned */ union {/* can be used with the kernel component of the page being used (for example: If the page is free, this field is used by the partner system. When used by the partner system, it indicates the block to the power of 2. Only the first page box of the block is used. #if USE_SPLIT_PTE_PTLOCKS #if ALLOC_SPLIT_PTLOCKS spinlock_t *ptl; #else spinlock_t ptl; #endif #endif /* * SLAB cache */ struct kmem_cache *slab_cache; /* SL[AU]B: Pointer to slab */ struct page *first_page; /* Compound tail pages */ }; #if defined(WANT_PAGE_VIRTUAL) /* This page is defined(WANT_PAGE_VIRTUAL) /* This page is defined(WANT_PAGE_VIRTUAL). #endif /* WANT_PAGE_VIRTUAL */ #ifdef CONFIG_WANT_PAGE_DEBUG_FLAGS unsigned long debug_flags; /* Use atomic bitops on this */ #endif #ifdef CONFIG_KMEMCHECK void *shadow; #endif #ifdef LAST_CPUPID_NOT_IN_PAGE_FLAGS int _last_cpupid; #endif }Copy the code


Struct page describes a page frame. The member variables we care about are unsigned long flags, struct list_head LRU and atomic_t _count.

  • Flags: Contains a lot of information, including the node number of the frame, the zone number of the frame, and the attributes of the frame.
  • Lru: Used to place this page descriptor into the appropriate linked list, such as the partner system or per-CPU page-box cache.
  • _count: indicates the reference count of the page frame. -1 indicates that the page frame is free. Greater than 0 indicates how many processes this page frame is allocated to use (sharing).

To prevent excessive fragmentation in memory, Linux generally classifies page types into three types:

  • Immovable page: a fixed location in memory that cannot be moved anywhere else. Most pages used in the kernel fall into this category.
  • Recyclable page: Cannot be moved directly, but can be deleted, and the contents of the page can be regenerated from some source. For example, pages whose content is mapped to file data are of this type. For this type, in the event of a memory shortage (allocation failure), a memory reclamation is initiated to write back pages of this type to free them.
  • Moveable pages: Moveable pages used by user-space processes that are not mapped to specific disk files (such as heap, stack, SHMEM shared memory, anonymous MMAP shared memory) are of this type and are mapped through the process page table, which can be copied to a new location by simply updating the process page table. Typically these pages are fetched from the high level memory management area.

  

Partner system

The main function of the buddy system is to reduce external fragmentation of physical memory (SLAB/SLUB reduces internal fragmentation of page frames). It is actually an array of struct free_area with a length of MAX_ORDER (11). The length of successive page boxes stored on the linked list of each array element is 2 to the order. The link list in free_area[0] stores the page box of length 1, the link list in free_area[1] stores the home page box list of two physically consecutive page boxes, and the link list in free_area[2] stores the home page box list of four physically consecutive page boxes. Free_area [10] in the list are stored physically continuous box box on page 1024 front page linked list, so the whole frame of partner system in administrative zone can be divided into continuous 1,2,4,8,16,32,64,128,256,512,1024 page frame in different stored in the list. Because each linked list in the partner system saves consecutive page frames, only the first page frame will be added to the linked list. Because of order, we can also know how many page frames after this page frame belong to this small continuous page frame. When it is necessary to apply for 4 page-box size memory in the normal memory management area, the system will go to the first linked list node in free_area[2] in the partner system of the normal memory management area, the page box of this node and the next 3 page boxes are free, and then return the home box to the applicant.


A block / * partner system, describe 1,2,4,8,16,32,64,128,256,512 or continuous box on page 1024 * / struct free_area {/ * points to all free in this piece of small pieces of the first page descriptors, */ struct list_head free_list[MIGRATE_TYPES]; /* Number of free blocks */ unsigned long nr_free; };Copy the code


 

In the partner system, due to the classification of pages, there will be multiple linked lists of different types in each continuous page box with the same length, as follows:


Enum {MIGRATE_UNMOVABLE, /* unmovable page */ MIGRATE_RECLAIMABLE, /* movable page */ MIGRATE_PCPTYPES, MIGRATE_RESERVE = MIGRATE_PCPTYPES, #ifdef CONFIG_CMA MIGRATE_CMA, #endif #ifdef CONFIG_MEMORY_ISOLATION MIGRATE_ISOLATE, /* Cannot allocate page frames from this linked list because this list is specifically used for NUMA nodes to move physical memory pages, Move the physical memory page to the CPU that uses the page most frequently */ #endif MIGRATE_TYPES};Copy the code


 

Free_area [2], which holds two consecutive page boxes, has the following structure:

 

When applying for a page frame from the partner system, there may be a situation that there is no free page frame available on the current continuous page frame list. At this time, the partner system will obtain a continuous length page frame block from the next level and split it into this level list. Of course, when the owner releases consecutive page frames, the partner system will also merge the consecutive page frames appropriately and put them into the next level. For example, I need to apply for 4 page frames, but there is no free page frame block in the 4 consecutive page frame block linked list. The partner system will obtain one page frame block from the 8 consecutive page frame block linked list, split it into two consecutive 4 page frame blocks, and put them into the list of 4 consecutive page frame blocks. The same goes for releasing, checking whether the physical frames before and after the freed frames are free and can form blocks of the next level of length.

 

Per-cpu page-frame cache

Each CPU page frame cache is also a distributor, cooperate with the partner system in use, the allocator that is specially used to assign a single frame, it maintains a single frame of two-way linked list, why need the distributor, because each has its own CPU hardware cache, when to read write a page, Will first loading this page on hardware cache, and if the process carried out on the pages of this hardware cache operation immediately after release, this page may be also stored in the hardware cache, so I need another process requests a page and write data immediately, distributor will this page assigned to it in the hardware cache, system will greatly increase the efficiency.

Box on page per CPU cache using a linked list to maintain a single frame of two-way linked list, each CPU has its own list (because each CPU has its own hardware cache), who are more likely in hardware cache pages known as “hot” page, the more impossible in hardware cache pages referred to as the “cold” page. In fact, it is very simple for the system to determine whether it is a hot page or a cold page. The more recently released pages are more likely to be hot pages. Therefore, in a bidirectional linked list, a single page box that may be hot pages is inserted from the head of the linked list, and a single page box that may be cold pages is inserted at the end of the linked list. Hot pages are fetched from the header and cold pages are fetched from the tail.

It is also possible to run into no free frames (allocated) in the per-CPU pager cache, in which case the per-CPU pager cache removes frames from the partner system and places them in the per-CPU pager cache. Conversely, if there are too many frames per CPU in the pager cache, some frames are put back into the partner system.

Struct per_cpu_pages is the core structure of a per-CPU pageset cache:


Struct per_cpu_pages PCP; struct per_cpu_pages PCP; #ifdef CONFIG_NUMA s8 expire; #endif #ifdef CONFIG_SMP s8 stat_threshold; s8 vm_stat_diff[NR_VM_ZONE_STAT_ITEMS]; #endif }; Struct per_cpu_pages {/* Number of pages in CPU cache */ int count; /* number of pages in the list */ * upper bound, when the number of pages in this CPU cache is greater than high, the batch of pages will be put back to the partner system */ int high; /* High watermark, emptying needed */ /* The number of page frames to be added or removed from the cache */ int Batch; /* Chunk size for buddy add/remove */ /* Lists of pages, one per migrate type stored on the pcp-lists */ */ struct list_head lists[MIGRATE_PCPTYPES] */ struct list_head lists[MIGRATE_PCPTYPES]; };Copy the code


 

About page frame recycling

Not all physical pages in memory are recyclable. Pages occupied by the kernel are not swapped out. Only physical pages mapped to user space are swapped out. In general, the following physical pages can be reclaimed by the Linux operating system:

  • The page occupied by the process map, including code segments, data segments, stacks, and dynamically allocated “storage heap” (malloc allocation).
  • Mmap () maps file contents to pages occupied in memory in user space.
  • Anonymous pages (all pages not mapped to files are anonymous maps, user-space stacks and heaps) : stacks in process user mode and areas of memory that use mMAP anonymous mapping (shared memory areas). Note: pages occupied by stacks are generally not paged out.
  • Special caches for slab allocator, such as caches for dentries of file directory structures, and caches for inodes of indexed nodes
  • Pages used by the TMPFS file system.

The Linux operating system uses the following two mechanisms to check system memory usage to determine if there is too little available memory for page reclamation.

  • Periodic checking: This is done by the daemon kswapd running in the background. The process periodically checks the memory usage of the current system. When the number of free physical pages in the system is less than a specified threshold, the process initiates page reclamation.
  • The “Severely Out of Memory” event is triggered: In some cases, such as, the operating system suddenly the need for the user process through partner system allocate a large block of memory, or the need to create a large buffer, and at that time in the system memory can’t provide enough physical memory to meet this memory requests, at this time, the operating system page recycling operations must be implemented as soon as possible, In order to free up some memory space to meet the above memory request. This type of page reclamation is also known as “direct page reclamation.”

If the operating system still can’t reclaim enough pages to meet the memory requirement, then the operating system has only one last option, which is to use OOM(out of Memory)killer, which selects the most appropriate process from the system to kill it. And frees all pages occupied by the process.

 

At the end

I’m going to talk about slab in the next one, it’s too much. At this point, remember that for physical memory, the system is always pager as the smallest unit of allocation, and allocation must be made through the management area allocator, in which it must be made through the partner system or per-CPU pager allocator. When we use malloc programming or allocate a small amount of memory in the kernel, we use slab to do so, and the purpose of slab is to subdivide a page box into smaller chunks of memory.