From the bullshit of the previous article, we should have understood the memory hierarchy. The technical details are complicated, but the idea is not hard to understand because it is a simple idea of caching. So this article starts with another topic about memory. Virtual memory. The idea is easy to understand.
I don’t know how many people have heard of the concept of virtual memory, virtual memory is one of the most important concept of the computer system, and it is the main reasons for the success of it have been silent, automatic work, in other words, we do the application programmers don’t need to interfere in its working process, but the farmers didn’t pursue the code is not a good move brick of migrant workers, So as aspiring programmers, we still need to understand virtual memory, and it’s almost impossible to understand the deep workings of programs without understanding virtual memory. It is also impossible to understand the concepts of assemblers, linkers, loaders, shared objects, files, and processes.
The last post raised some questions for you to think about:
- No matter what the program, the final direct/indirect compilation result is 0 and 1. (For those of you who don’t know, please read my next articleA few things to know about cross-platform), such as this assembly code:
mov eax,0x123456;
It means to put the memory0x123456
The contents of theeax
This register. Data for each application is stored together in memory. Suppose there is a music player application in assembly code that references0x123456
This memory address. But with so many applications running at the same time, it’s entirely possible for other applications to reference them0x123456
This address. So why are there no conflicts and no mistakes?
- Process is one of the most important concepts in computing. What is a process? Process is an activity to run about a certain data set, is running in its own address space of the period of the inclusive program, explain the common points, a program at run time, we will have an illusion, the process seems to be an exclusive use of CPU and memory, the CPU is not continuously a pick a execute the program of instruction, All memory space is used for code and data allocation for the process. (This is not precise, in fact, there is still a part of memory to allocate
The kernel kernel
). Speaking of which, it’s like getting the whole world. The CPU is mine, the ram is all mine, and the girls are still mine. Of course, this is just an illusion. But how do these illusions work?
- Library apis are referenced in programs, such as every C program
stdio.h
The libraryprintf()
When the program is running, the library code has to be added to memory. So many programs reference the library, do I need to add many copies to memory? This is certainly not possible, so how is the library code shared by all processes?
These let us ponder the most frightening questions, will be answered through this article.
Physical and virtual addressing
To a visitor, main memory is an array of units of M bytes, each with a unique Physical Address (PA). The first address is 0, followed by 1,2,3—– m-2, m-1; This is called the linear address space. This natural way of accessing memory is known as physical addressing.
Note: When accessing memory, the access time for any address is always the same, regardless of whether it is the 0th or m-1 address.
Of all the data structures, we say that hash tables are the fastest, faster than red-black trees or whatever, so why are hash tables the fastest? That’s because the hash table essentially uses arrays internally. So the array is still the fastest. So why is the array the fastest? This is because if we know the starting address of the array and the number of an element, we can get the address of that element in memory, and in memory, access to any address will always take the same time. Structures like linked lists, trees, etc., can only be traversed. (Good hash algorithms are hard to design, though, and that’s another topic).
Figure 10: A system using physical addressing
The figure above shows an example of physical addressing, which is a load instruction that reads the four bytes starting at physical address 4. The CPU passes the instruction and address to main memory across the memory bus. Main memory reads the four bytes starting at physical address 4 and returns it to the CPU.
Since this article mainly discusses virtual memory, it is about the interaction between L4 main memory and disk, for the convenience of the article, sometimes directly said memory refers to main memory. So don’t mistake these for L1, L2 caches. If you don’t understand what this means, be sure to check out my last article on Memory (I) : Memory hierarchies and come back to this article.
Early computers used physical addressing, but in today’s multi-tasking computer era, virtual Addressing is commonly used. As shown below:
Figure 11: A system using virtual addressing
The CPU accesses main memory through a virtual address (VA), which is converted to a physical address before being sent to main memory. The task of converting a virtual address into a physical address is called address translation.
Address translation requires coordination between CPU hardware and operating system. Dedicated hardware on the CPU chip called the Menory Management Unit (MMU) dynamically translates virtual addresses using a query table stored in main memory, the contents of which are managed by the operating system.
A few modern computer systems still use physical addressing, such as DSPS, embedded systems, and supercomputer systems. The primary task of these systems is to perform a single task, unlike general-purpose computers that need to perform multiple tasks. As you can imagine, physical addressing is faster. For the same reason that Java is theoretically slower than C++ in some cross-platform literature.
In front of the virtual address, so about the beginning of the article mentioned those questions, some people may have been inside. Because those addresses are virtual addresses, not real physical memory addresses. Now that the basic idea is understood, let’s talk about the details in more detail.
Process address space
Figure 12: Process address space
The figure above shows a 64-bit address space for a process. When the compiler compiles the program, it compiles the result into a 32/64-bit address space. Virtual addressing simplifies the work of compilers and linkers. It is also because of virtual memory that each process has a large, consistent, private address space. This facilitates memory management and protects each process’s address space from being corrupted by other processes. It also facilitates shared libraries.
Virtual memory is also a caching idea
Virtual memory treats main memory like a cache on a disk, holding only live areas and passing data back and forth between the disk and main memory as needed.
Conceptually, virtual memory is organized as an array of N contiguous byte size cells stored on disk, known as byte arrays. Each byte has a unique virtual address as the index of the array. The mapping between virtual memory addresses and disk addresses is established. The contents of the array active on disk are cached in main memory. In the memory hierarchy, the data on the disk (lower level L5, see figure 4 in our previous article) is divided into blocks that act as transfer units to and from main memory (higher level L4). Main memory acts as a cache of virtual memory (or disk).
Virtual memory (VM) systems divide Virtual memory into what are called fixed-size Virtual pages (VP), each of which is a fixed byte in size. Similarly, Physical memory is divided into Physical pages (PP) of fixed byte size (also known as Page frames).
At any given moment, the virtual page is divided into three disjoint parts:
- Unallocated: A page that has not been allocated (or created) to the VM system and does not have any data associated with it and therefore does not occupy any memory/disk space.
- Cached: Allocated pages currently Cached in physical memory.
- UnCached: The page is mapped to disk, but not cached in physical memory.
The unallocated VP doesn’t take up any real physical space, so understand that. 32-bit programs have 4 gigabytes of address space, and 64 gigabytes of address space is a very large astronomical number (16777216T, it seems), and currently our computers are equipped with only 2 terabytes of disk, 16 GIGABytes of memory. If the 64-bit program each VP maps to the actual PP. It doesn’t correspond in any way. There is also no need for one-to-one mapping, as you can see in “Figure 12: Process Address space”, there is a lot of white space in the address space. After all, it’s impossible for a program to actually use that much address space.
Figure 13: THE VM uses main memory as the cache
The figure above shows that in an 8-page virtual memory, virtual pages 0 and 3 have not yet been allocated, so they do not exist on disk. Virtual pages 1, 4, and 6 are cached in physical memory. Virtual pages 2, 5, and 7 have been mapped but not cached in main memory.
Of course, that diagram is not marked correctly, the VP part, n-p and n-1 should be marked 3 and 7 respectively, but we can’t find a better diagram (this kind of diagram is too much pressure to draw). So you know we just assumed we had eight VP’s.
Page table (page table)
The system must have a way to determine if a virtual page is cached somewhere in main memory. This can be specifically divided into two cases.
- Already in main memory, you need to determine which physical page the virtual page exists in.
- If the virtual page is not in main memory, the system must determine where the virtual page is stored on the disk, and select a sacrifice page in the physical main memory, and copy the virtual page from disk to main memory to replace the sacrifice page.
These functions are provided by a combination of hardware and software, including the operating system, the Memory Management Unit (MMU) in the CPU, and a data structure stored in physical Memory called the Page Table, which maps virtual pages to physical pages. The page table is read each time the address translation hardware converts a virtual address to a physical address.
Figure 14: Page table
The figure above shows the basic structure of a Page Table, which is an array of Page Table entries (PTES). Each page of a virtual address has a corresponding PTE in the page table. Here we assume that each PTE consists of a Valid bit and an N-bit address field. Significant bits indicate whether the virtual page is currently cached in main memory.
- If the significant bit is 1, the virtual page is cached in main memory. The address field represents the starting location of the corresponding physical page in main memory.
- If the significant bit is 0, null in the address field indicates that the virtual page has not been assigned, otherwise the address points to the virtual page’s starting location on disk.
Page hits and missing pages
In our last article what is Memory (1) : Memory hierarchy, we talked about cache hits and misses, both of which are cache ideas, and the same is certainly true here. And the cost of a cache miss between disk and main memory must be much higher. Because between L0 and L4, the speed of each level of cache is about 10 times different, but between L4 main memory and L5 disk, they are about 100,000 times different. Therefore, the page capacity exchanged between main memory and disk is maximized to maximize the hit ratio. For replacement strategies, the operating system uses more sophisticated algorithms.
In the previous article what is memory (1) : memory hierarchy, each time we replace the area, we used block, but here we are talking about page, actually the same meaning. It’s just called differently for historical reasons.
When the CPU wants to read content contained in a virtual page, if the page is already cached in main memory, it is a page hit. It’s perfect. But if the page is not cached in main memory, it is called a page fault.
Figure 15: Application of words in VP3 causes a miss
As shown in the figure above, the CPU references VP3, which is not cached in main memory. The system reads PTE3 from memory and learns that VP3 is not cached, which raises a page-missing exception. A missing page exception calls the kernel’s missing page exception handler, which selects a sacrificial page. As shown in the figure below, the sacrifice page selects VP4, which is stored in PP3.
Figure 16: VP4 is sacrificed
If the content of VP4 is modified, the kernel copies it back to disk. Next, the kernel assigns VP3 from disk to PP3 in memory and updates PTE3. It then returns to the user process. When the exception handler returns, it restarts the instruction that caused the page miss, and when it reexecutes the instruction, because VP3 is already in main memory, it is a page hit.
Figure 17: VP3 is cached to PP3
As customary terms, our activity of sending pages between disks and memory is called Considerations of swapping or paging. This swapping occurs only when a miss occurs (that is, the system does not prestore disk contents into memory). The strategy is called Demand Paging.
We just said that missing pages are an exception, but in computer systems, division by zero, reading and writing files, what we called interrupts in the previous article, and even try catches in our code are all exceptions. For example, dividing by 0 is an exception of type 0 fault specified by the Intel CPU. Reading and writing files are type 0 and Type 1 Trap exceptions specified by Linux, respectively. Multi-task context switch, process creation and recycling, etc., are closely related to the handling of abnormal flow in the system. Of course, that’s a topic for another day. We do not elaborate here.
Virtual memory as a tool for memory management and memory protection
As a matter of course, each process has a separate page table and a separate virtual address space
Going back to the problem at the beginning of the article, such as the stdio library that every C program calls, it is not possible to add a copy of the library for every process, there is only one copy of the CONTENTS of the STdio library in memory, shared by every process that uses the library.
Figure 18: Shared pages
The page table for the first process maps VP2 to a physical page. The second process also maps its VP2 to the physical page. So the physical page is shared by both processes.
At this point, if you look at the “figure :12 process address space “, you will see that in the address space, the” memory mapped area of the shared library “is the same for each process starting address. Also think about the way shared memory is communicated between processes, so virtual memory simplifies the sharing mechanism
As we all know, there are Pointers in C language, you can directly do memory operations. Thanks to virtual memory, our pointer operations cannot access other processes’ regions, but even for our own address space, many areas of memory should be off-limits, including not only the kernel’s region but also its own read-only code segments. Virtual memory provides such a memory protection tool.
Address translation mechanisms provide memory access control in a natural way. Add some extra control bits on the PTE to add permissions. Every time the CPU generates an address, the address translation hardware reads a PTE.
Figure 19: Virtual memory provides memory protection
In the figure above, three additional control bits are added to each PTE. The SUP bit indicates whether the process must run kernel mode, and the READ and WRITE bits control the READ and WRITE permissions of the page, respectively. If an instruction violates these control permissions, the CPU triggers a fault and passes control to the exception handler in the kernel. This type of anomaly is generally called a segmentation fault.
Segment and page
We understand the page, page is the operating system for the management of main memory and convenient division, not visible to the user. But to think about the situation, suppose a page is 1M in size. But a program only adds up to 0.5 megabytes of data, so page swapping between memory and disk is an obvious waste of memory. So another way to divide is by sections. In the example above, I split the segment into 0.5m and swapped between memory and disk to avoid waste.
Segments are logical units of information and are flexibly divided according to user requirements. Therefore, they are not fixed in size, visible to users, and provide two-dimensional address space.
For paragraph, I did not find good materials, so I did not understand more clearly, many articles on the Internet are copied from each other. As far as I know, assembler programmers can operate segments directly, but do we programmers who write high-level languages have corresponding apis that can operate segments? Therefore, for the relevant knowledge of paragraph, really do not understand, but also hope to understand the students can point out criticism in the message area, or message related article links. I will add to this blog post later. thank you
The role of swap partition
For those of you familiar with Linux, Linux has a swap partition. When the physical memory of the system is insufficient, the Swap space needs to be released for the use of the current program. The freed space may come from programs that have not operated for a long time. The information in the freed space is temporarily stored in Swap. When the programs are ready to run, the saved data is restored from Swap to memory. The system always performs Swap only when the physical memory is insufficient.
Your computer opened a music player, but also did not play songs, and then a few days you don’t shut down, also didn’t close the music player, more and more as the running program, memory is fast enough, so the choice of the operating system will this music player memory state (including the stack state, etc.) are written to disk to save the swap area. This frees up some memory for other programs that need to run. Whenever you want to listen to music, you will find this music player program operation. At this point, the system will re-read the related information of the music player from the swap area in the disk, and then send it back to the memory to run.
The disk space that also has class functions under Windows is an anonymous disk space (on drive C) that is not visible to the user.
Special note: Literally, the swap area can also be called virtual memory
The swap area on the hard drive acts as memory (albeit slowly). The swap area serves to expand memory. So in some sense, swap can also be called virtual memory, but this virtual memory is literal. Virtual memory is a different concept from what we explain in this article from the perspective of a computer system. So pay special attention to that. Because some people understand virtual memory as swap. This virtual memory is not the other virtual memory, so understand their concepts and functions. Otherwise, when discussing virtual memory with other people, you may not be able to keep your mouth shut.
On Linux, this area is called swap. On Windows, this area is literally called “virtual memory” instead of swap. So two different meanings of virtual memory, the reader must be clear.
Baidu Encyclopedia on the rightVirtual memoryThe explanation is very confusing
About virtual memory, read the content of Baidu Encyclopedia, some places explain the confusion, some places are right, but some places explain the content of swap partition. Swap swap can be called virtual memory literally, but this virtual memory is not that virtual memory. Baidu Encyclopedia is more confused about the introduction of this point, the content of Baidu Encyclopedia is more, but did not distinguish this point, will only be more and more confused. I looked it up on Wikipedia, and it’s not a long entry, but this is an important one.
Note: Virtual memory doesn’t just mean “expanding physical memory with disk space” — it just means expanding the memory level to include the hard drive. The expansion of memory to disk is a result of the use of virtual memory technology, which can also be achieved by overwriting or swapping inactive programs and all their data to disk. The definition of virtual memory is based on redefining the address space as a “contiguous virtual memory address” in order to “fool” programs into thinking they are using a chunk of contiguous addresses.
So I think the Wikipedia interpretation is confusing, and the Wikipedia interpretation should be correct.
I’ve finished two articles on memory. Because I am uneducated, if there is a misunderstanding or explanation is not clear, I hope you readers face criticism.
Author: www.yaoxiaowen.com
Blog: www.cnblogs.com/yaoxiaowen/
Making: github.com/yaowen369
Welcome to my blog content criticism, if any questions, you can comment or email ([email protected]) contact
Welcome to reprint, reprint please indicate the source. thank you