Cloud and Virtualization

Cloud computingIT is a computing mode that provides dynamic and scalable resources through Internet service. After years of development, IT has become an important support of enterprise IT technology. Virtualization is one of the core technologies of cloud computing. A computer is abstracted into multiple logical computers, namely virtual machines (VMS). Each VM is an independent secure environment that can run different operating systems without affecting each other.Virtual **** technologyThe cloud computing system can schedule resources in a timely manner according to load conditions, improving resource utilization and ensuring that applications and services do not affect service quality due to insufficient resources. However, virtualization comes at a cost, in the form of the performance penalty of abstraction of resources that virtualization has been trying to address. Virtualized resource abstraction can be easily divided into three parts:CPU virtualization, memory virtualization, and device virtualization. Device virtualization enables VMS to pass through devices, such as networks and storage devices, without performance loss. Supported by hardware features, CPU virtualization provides the same performance of common commands as a bare VM. Memory virtualization, compared with bare – metal machines, still has a big difference, is worth paying attention to the problem.Memory virtualization Virtual memory:When it comes to memory virtualization, we have to mention the concept of virtual memory. Early operating systems had only physical addresses and limited space, and processes had to be careful not to overwrite other processes’ memory when using it. To avoid this problem, the concept of virtual memory is abstracted to ensure that each process has a contiguous, independent virtual memory space. The process directly uses the Memory through the Virtual Address (VA). The VA sent by the CPU during Memory access is intercepted by the Memory Management Unit (MMU) and converted to the Physical Address (PA). The VA to PA mapping is managed using a page table, which is automatically queried by MMU during conversion.

Memory virtualization: Similar to the concept of virtual memory, each VM on a host occupies the entire physical address space. Therefore, memory virtualization needs to be abstracted to ensure that each VM has an independent address space. In this way, both VMS and Physical machines have VA and PA concepts, namely Guest Virtual Address (GVA) and Guest Physical Address (GPA). And Host Virtual Address (HVA) and Host Physical Address (HPA). Programs in the virtual machine use GVA and eventually need to be converted to HPA. The two VA-PA mappings (GVA to GPA and HVA to HPA) are also managed by the page table. GPA to HVA are generally several consecutive linear mappings managed by the Virtual Machine Monitor (VMM).

Process access needs to be converted from VA to PA, and the conversion path has changed significantly since memory virtualization was introduced. After virtualization, the conversion process becomes GVA -> GPA -> HVA -> HPA. Longer and more complex paths pose challenges to access security and performance, which are also the goals of memory virtualization: 1) Security, that is, the validity of address translation. VMS cannot access memory that does not belong to them. 2) Performance, that is, the efficiency of address translation, including the low cost of establishing the conversion relationship and the low cost of the conversion process itself. To achieve the goal of memory virtualization, many virtualization solutions have been proposed. SPT (Shadow Page Table) and EPT (Extended Page Table) are two typical and most familiar solutions. Let’s use this as a starting point to see how they work before moving on to other virtualization solutions.

SPT: Since the original hardware only supported one layer page table conversion, converting VA to PA directly on virtual machines or physical machines could not complete GVA to HPA conversion. So SPT sets up a shortcut, the shadow page table, to directly manage the GVA to HPA mapping, as shown in the figure below. Each shadow page table instance corresponds to a process in a VM. To create a shadow page table, the VMM must query the page table of the process in a VM.

Since the shadow page table manages the direct mapping from GVA to HPA, the SPT address translation path is equivalent to the physical machine path, and the direct query of a layer of page table can complete the address translation. When using a level 4 page table, the transformation process is shown below.

Advantages: The SPT address translation process has low overhead, comparable to that of physical machines. Disadvantages: 1) The establishment of address translation relationship is very expensive. In order to ensure the validity of address translation, all the establishment of address translation relationship, that is, the page table modification of the VM process, will be intercepted and then trapped in the privileged VMM for execution;

2) The shadow page table itself needs to occupy memory, and a shadow page table only corresponds to one process in the VM, which consumes a lot of memory resources.

EPT:Later hardware added support for nested page tables for virtualization, allowing the hardware to automate two-tier page table transformations. EPT is a hardware-based solution that adds an extended page table to manage gVA-GPA mapping, as shown in the following figure. The two layers of page tables are independent of each other, and the mapping between the two layers is automatically transformed by the hardware.

The page tables at all levels (gL4, gL3, gL2, and gL1) in the VM are only GPA. When querying the next level, the extended page tables (nL4, nL3, nL2, and nL1) must be converted to HPA, making the entire conversion path very long. When both page tables are level 4, the conversion process is shown in the figure below.

Advantages: The establishment of address translation relationships is low, and independent EPT page tables ensure the validity of address translation, so that the virtual machine page table can be modified without VMM intervention.

Disadvantages: The conversion process is expensive, requiring 24 (4 + 4 + 4 * 4) hardware look-up table conversions in the worst case.

Both classic solutions have a solid guarantee of safety, but each has a performance flaw. SPT pays a lot of money to establish a translation relationship to ensure the legitimacy of address translation, but EPT eliminates the cost of establishing a translation relationship, but the conversion path is longer.

Other explorations There are many other explorations in the industry and academia around memory virtualization. The basic idea is similar to SPT or EPT, which can be divided into three categories:

1) One-layer page table scheme. Similar to SPT, a layer of page tables is used to directly manage the GVA to HPA mapping;

2) Two-layer page table scheme. Similar to EPT, two separate page tables are used to manage GVA to GPA and GPA-HPA mapping.

3) Hybrid solutions. Combined with the first two kinds of schemes, dynamic selection.

Direct the Paging:A layer page table solution, which was the paraviralization solution of Xen when the early hardware only supported a layer page table. The biggest difference from SPT is that the VIRTUAL machine page table from GVA to GPA is not maintained separately. The virtual machine knows that it is in a virtualized environment, that is, it knows that its page table content is HPA. Vm page table modification also needs to sink, but the active trap mode can be batch, while THE SPT mode is passive intercept trap. When reading the page table, you can only get HPA. You need to check a M2P (Machine to Physical) table to get GPA.

Direct Paging also uses a layer of page tables to manage GVA to HPA mappings, and the path for address translation is the same as that for SPT. When using level 4 page tables, you only need four table lookups at worst. Advantages: The overhead of address translation is low, comparable to that of physical machines.

Disadvantage:

1) The establishment of address translation relationship costs a lot, and all page table modifications need to actively trap;

2) VMS are required to perform paravirtualization adaptation. VMS need to be aware that their page tables manage GVA to HPA mappings. Direct Segment: Two-tier page table scheme, which is based on new hardware in academia. The GVA to GPA mapping is managed in the same way as EPT, using multi-level page tables. However, the mapping from GPA to HPA adopts a segmentation mechanism, and GPA can be converted to HPA by adding an offset through hardware.

Although GPA is not equal to HPA, the mapping relationship between GPA and HPA is very simple. It only needs to add an offset to the Direct Segment hardware. Compared with the path of the physical machine, the whole conversion path has little difference, only a few more hardware offsets. When a VM uses a level 4 page table, the conversion path is shown in the following figure, where DS indicates the hardware support for GPA to HPA conversion.

Advantages: The establishment cost of address translation relationship is low, and the cost of address translation process is also low.

Disadvantage:

1) The hardware is required to support the piecewide mapping from GPA to HPA, and the existing hardware does not have such a function;

2) A large contiguous segment of memory needs to be allocated, that is, the host cannot have too much memory fragmentation.

Flat EPT: Two layer page table solution, which is also proposed by academia based on new hardware. Overall, EPT is very similar to EPT, with the only difference being that EPT manages GPA to HPA using multiple levels of page tables, typically four levels with 512 items per level; Flat EPT uses only one level of Flat page tables with more than 512 entries.

Similar to EPT, page tables at each level in a VIRTUAL machine are GPA, and queries for the next level need to be converted to HPA through the flat Extended page Table (nL4). Because the flat extended page table has only one level, the transformation path is much shorter than EPT. When using level 4 page tables in a virtual machine, the conversion path is shown below, and at worst, only 9 (4 + 1 + 4 * 1) table look-up times are required.

Advantages: The establishment of address translation relationship is cheap, and the cost of the translation process is also low. Compared with Direct Segment, it has lower memory allocation requirements, requiring only a small amount of continuous memory for flat extended page tables (only 16M for an 8 gb vm). Disadvantages: The hardware needs to support the flat extended page table. The current hardware only supports the multi-level extended page table with entry 512. Mix SPT and EPT: This is a scheme proposed earlier in the academic circle. Simply speaking, it is a dynamic time-sharing switch between SPT and EPT. Monitor and collect TLB Miss and Page Fault data during VM running, and switch between SPT and EPT when the TLB miss and Page Fault data reach the preset threshold, as shown in the following figure:

  • When the TLB miss rate is higher than T1 and the Page Fault frequency is lower than T2, the system switches from EPT to SPT
  • When the TLB miss rate is lower than threshold T1 and Page Fault frequency is higher than threshold T2, the system switches from SPT to EPT

Advantages: Opportunities to leverage the advantages of SPT and EPT to achieve better performance. Disadvantages: 1) It is difficult to set the page table switching threshold, and hardware configurations may affect the threshold; 2) Switching between SPT and EPT also comes at a cost, mainly the destruction and reconstruction of SPT.

conclusion

The obvious advantage of the one-layer page table is the low cost of the address translation process, which is the same as the physical machine. The problem to be solved is to reduce the overhead of the address translation establishment. One possible direction is to give up some security and make page table changes lighter; Another more practical direction is to use it in appropriate scenarios, i.e. for loads where page table changes are infrequent.

The advantage of two-layer page table is that the overhead of setting up address translation is small. Virtual machines can modify the page table independently. The problem to be considered is to shorten the translation path. This direction is actually feasible, but depending on the support of new hardware, it is unlikely to meet the requirements of new hardware in the short term.

The original intention of the hybrid page table is to make full use of the advantages of the two types of page table, but it is very difficult to do a good dynamic mode switching, the difference of load and even the difference of hardware may affect the switching effect. Perhaps targeted tuning for known loads is the way to go.

In the long run, with new hardware, two-layer page tables (especially Flat EPT) are a better solution, and address translation can be efficient without sacrificing security and versatility. However, in the short term, the new hardware is too early, and it is more practical to further explore and optimize the one-layer page table scheme. We will continue to explore more possibilities in the path of memory virtualization and welcome you to join the OpenAnolis dragoons community for discussion.

– the –

About the author

Zhiheng Tao (Junchuan), joined ali Cloud Operating system – cloud native underlying system team in 2020, and is currently engaged in performance optimization.

, recruiting

We are an operating system team from Aliyun. We are looking for talents with experience in kernel, virtualization, container, network, storage, security and other system technologies, and are interested in building cloud native underlying system. Please contact us (email: [email protected]).

Join the Dragon lizard community

Join wechat group: Add community assistant – Dragon Community Xiao Long (wechat: Openanolis_ASSIS), note [Dragon] pull you into the group; Welcome developers/users to join the Draganolis community to promote the development of draganolis community and build an active and healthy open source operating system ecosystem together!

About the dragon Lizard community

Dragon Lizard community is a non-profit open source community composed of enterprises and institutions, colleges and universities, scientific research units, non-profit organizations, individuals, etc., on the basis of voluntariness, equality, open source and collaboration. Dragon Lizard community was founded in September 2020 to build an open source, neutral, open Linux upstream distribution community and innovation platform.

The short-term goal is to develop Anolis OS as a CentOS alternative and rebuild a distribution compatible with major international Linux vendors. The medium and long term goal is to explore and build a future-oriented operating system, establish a unified open source operating system ecosystem, incubate innovative open source projects, and prosper the open source ecosystem.

Join us to build an open source operating system for the future!

Https://openanolis.cn