Generally we do not need to obtain the physical address corresponding to the virtual address of the process from the user mode, because the user process does not care about the physical address at all.

Users may be concerned in a few application scenarios, such as those where DMA is done in user mode (such as DPDK). There are also some scenarios, such as debugging to analyze the memory usage of each page, whether swap out, etc.

From the user mode to get the physical address corresponding to the virtual address, we can not walk process page table, also do not have permission. Fortunately, the kernel provides us with an interface called Pagemap, which is architecturally independent of the hardware. Under /proc/pid/there is a file called pagemap, which generates a 64-bit descriptor for each page to describe the physical page frame number or SWAP value of the virtual address.

This 64bit is described as follows:

Different architectures have different MMUs and different page table formats, but the pagemap interface is independent of the format of the specific page table, so to speak, it is abstracted.

Let’s ignore the effects of swap (assuming swap is disabled or page is always in pin state) and copy the code from DPDK to convert virtual addresses to physical addresses:

#define phys_addr_t uint64_t #define PFN_MASK_SIZE 8 phys_addr_t rte_mem_virt2phy(const void *virtaddr) { int fd, retval; uint64_t page, physaddr; unsigned long virt_pfn; int page_size; off_t offset; /* standard page size */ page_size = getpagesize(); fd = open("/proc/self/pagemap", O_RDONLY); if (fd < 0) { ... } virt_pfn = (unsigned long)virtaddr / page_size; offset = sizeof(uint64_t) * virt_pfn; if (lseek(fd, offset, SEEK_SET) == (off_t) -1) { ... return -1; } retval = read(fd, &page, PFN_MASK_SIZE); close(fd); . /* * the pfn (page frame number) are bits 0-54 (see * pagemap.txt in linux Documentation) */ if ((page & 0x7fffffffffffffULL) == 0) return -1; physaddr = ((page & 0x7fffffffffffffULL) * page_size) + ((unsigned long)virtaddr % page_size); return physaddr; }Copy the code

The final step is the key calculation:

       physaddr = ((page & 0x7fffffffffffffULL) * page_size)
                + ((unsigned long)virtaddr % page_size);
Copy the code

Page & 0x7FFFFFFFFFFFULL takes the page frame number (PFN), multiples it by the page size to get the physical address at the start of the page, and then adds the in-page offset of virtADDR % page_size to get the final physical address.

Call the above function to convert the address:

int main(int argc, char *argv[])
{
  uint8_t *p = malloc(1024 * 1024);

  *(p + 4096) = 10;
  printf("virt:%p phys:%p\n", p + 4096, rte_mem_virt2phy(p + 4096));

  *(p + 2 * 4096) = 10;
  printf("virt:%p phys:%p\n", p + 2 * 4096, rte_mem_virt2phy(p + 2 * 4096));
}
Copy the code

The running results are as follows:

~$ sudo ./a.out 
virt:0x7f81e402a010 phys:0x2b601010
virt:0x7f81e402b010 phys:0x3ceec010
Copy the code

The code implementing the kernel-mode Pagemap Proc interface is located at:

fs/proc/task_mmu.c
Copy the code

The core function is the process of converting PTE to pagemap_entry. Interested children can read carefully below:

Pay special attention to where the red lines are drawn to see how the flags in pagemap are placed.

Three things to watch ❤️

If you find this article helpful, I’d like to invite you to do three small favors for me:

  1. Like, forward, have your “like and comment”, is the motivation of my creation.

  2. Follow the public account “Java rotten pigskin” and share original knowledge from time to time.

  3. Also look forward to the follow-up article ing🚀

This article comes from: Linux code field, author: Rong Baohua