MIT-6.S081-lab3
Lab: page tables
In this lab you will explore page tables and modify them to to speed up certain system calls and to detect which pages have been accessed.
Before you start coding, read Chapter 3 of the xv6 book, and related files:
kern/memlayout.h
, which captures the layout of memory.kern/vm.c
, which contains most virtual memory (VM) code.kernel/kalloc.c
, which contains code for allocating and freeing physical memory.
It may also help to consult the RISC-V privileged architecture manual.
To start the lab, switch to the pgtbl branch:
$ git fetch
$ git checkout pgtbl
$ make clean
Speed up system calls (easy)
Some operating systems (e.g., Linux) speed up certain system calls by sharing data in a read-only region between userspace and the kernel. This eliminates the need for kernel crossings when performing these system calls. To help you learn how to insert mappings into a page table, your first task is to implement this optimization for the getpid()
system call in xv6.
When each process is created, map one read-only page at USYSCALL (a VA defined in memlayout.h
). At the start of this page, store a struct usyscall
(also defined in memlayout.h
), and initialize it to store the PID of the current process. For this lab, ugetpid()
has been provided on the userspace side and will automatically use the USYSCALL mapping. You will receive full credit for this part of the lab if the ugetpid
test case passes when running pgtbltest
.
When each process is created, map one read-only page at USYSCALL (a VA defined in memlayout.h). At the start of this page, store a struct usyscall (also defined in memlayout.h), and initialize it to store the PID of the current process. For this lab, ugetpid() has been provided on the userspace side and will automatically use the USYSCALL mapping. You will receive full credit for this part of the lab if the ugetpid test case passes when running pgtbltest.
Some hints:
- You can perform the mapping in
proc_pagetable()
inkernel/proc.c
. - Choose permission bits that allow userspace to only read the page.
- You may find that
mappages()
is a useful utility. - Don’t forget to allocate and initialize the page in
allocproc()
. - Make sure to free the page in
freeproc()
.
在proc.h添加struct usyscall 结构体用于内核和进程读取pid
struct usyscall *usyscall;
在allocproc中为进程分配内存,添加usyscall 部分 并将usyscall->pid初始化
static struct proc*
allocproc(void)
{
struct proc *p;
for(p = proc; p < &proc[NPROC]; p++) {
acquire(&p->lock);
if(p->state == UNUSED) {
goto found;
} else {
release(&p->lock);
}
}
return 0;
found:
p->pid = allocpid();
p->state = USED;
// Allocate a trapframe page.
if((p->trapframe = (struct trapframe *)kalloc()) == 0){
freeproc(p);
release(&p->lock);
return 0;
}
//Allocate an usyscall
if((p->usyscall = (struct usyscall*)kalloc()) == 0 )
{
freeproc(p);
release(&p->lock);
return 0;
}
// An empty user page table.
p->pagetable = proc_pagetable(p);
if(p->pagetable == 0){
freeproc(p);
release(&p->lock);
return 0;
}
p->usyscall->pid = p->pid;
// Set up new context to start executing at forkret,
// which returns to user space.
memset(&p->context, 0, sizeof(p->context));
p->context.ra = (uint64)forkret;
p->context.sp = p->kstack + PGSIZE;
return p;
}
在proc_pagetable()中进行映射
// 将USYSCALL 进行映射
if(mappages(pagetable, USYSCALL,PGSIZE,(uint64)p->usyscall
,PTE_R|PTE_U)<0)
{
uvmunmap(pagetable, USYSCALL, 1, 0);
uvmfree(pagetable, 0);
return 0;
}
在freeproc部分添加释放usyscall部分
static void
freeproc(struct proc *p)
{
if(p->trapframe)
kfree((void*)p->trapframe);
p->trapframe = 0;
if(p->usyscall)
kfree((void*)p->usyscall);
p->usyscall = 0;
if(p->pagetable)
proc_freepagetable(p->pagetable, p->sz);
p->pagetable = 0;
p->sz = 0;
p->pid = 0;
p->parent = 0;
p->name[0] = 0;
p->chan = 0;
p->killed = 0;
p->xstate = 0;
p->state = UNUSED;
}
在proc_freepagetable中取消映射
// Free a process's page table, and free the
// physical memory it refers to.
void
proc_freepagetable(pagetable_t pagetable, uint64 sz)
{
uvmunmap(pagetable, TRAMPOLINE, 1, 0);
uvmunmap(pagetable, TRAPFRAME, 1, 0);
uvmunmap(pagetable,USYSCALL,1,0);
uvmfree(pagetable, sz);
}
Which other xv6 system call(s) could be made faster using this shared page? Explain how.
Print a page table (easy)
To help you visualize RISC-V page tables, and perhaps to aid future debugging, your second task is to write a function that prints the contents of a page table.
Define a function called vmprint(). It should take a pagetable_t argument, and print that pagetable in the format described below. Insert if(p->pid==1) vmprint(p->pagetable) in exec.c just before the return argc, to print the first process's page table. You receive full credit for this part of the lab if you pass the pte printout test of make grade.
Now when you start xv6 it should print output like this, describing the page table of the first process at the point when it has just finished exec()
ing init
:
page table 0x0000000087f6e000
..0: pte 0x0000000021fda801 pa 0x0000000087f6a000
.. ..0: pte 0x0000000021fda401 pa 0x0000000087f69000
.. .. ..0: pte 0x0000000021fdac1f pa 0x0000000087f6b000
.. .. ..1: pte 0x0000000021fda00f pa 0x0000000087f68000
.. .. ..2: pte 0x0000000021fd9c1f pa 0x0000000087f67000
..255: pte 0x0000000021fdb401 pa 0x0000000087f6d000
.. ..511: pte 0x0000000021fdb001 pa 0x0000000087f6c000
.. .. ..509: pte 0x0000000021fdd813 pa 0x0000000087f76000
.. .. ..510: pte 0x0000000021fddc07 pa 0x0000000087f77000
.. .. ..511: pte 0x0000000020001c0b pa 0x0000000080007000
The first line displays the argument to vmprint
. After that there is a line for each PTE, including PTEs that refer to page-table pages deeper in the tree. Each PTE line is indented by a number of " .."
that indicates its depth in the tree. Each PTE line shows the PTE index in its page-table page, the pte bits, and the physical address extracted from the PTE. Don’t print PTEs that are not valid. In the above example, the top-level page-table page has mappings for entries 0 and 255. The next level down for entry 0 has only index 0 mapped, and the bottom-level for that index 0 has entries 0, 1, and 2 mapped.
Your code might emit different physical addresses than those shown above. The number of entries and the virtual addresses should be the same.
Some hints:
-
You can put vmprint()
in kernel/vm.c
.
- Use the macros at the end of the file kernel/riscv.h.
- The function
freewalk
may be inspirational. - Define the prototype for
vmprint
in kernel/defs.h so that you can call it from exec.c. - Use
%p
in your printf calls to print out full 64-bit hex PTEs and addresses as shown in the example.
Explain the output of vmprint in terms of Fig 3-4 from the text. What does page 0 contain? What is in page 2? When running in user mode, could the process read/write the memory mapped by page 1? What does the third to last page contain?
根据freewalk(vm.c)函数应该可以知道是以递归的形式进行打印信息。
// Recursively free page-table pages.
// All leaf mappings must already have been removed.
void
freewalk(pagetable_t pagetable)
{
// there are 2^9 = 512 PTEs in a page table.
for(int i = 0; i < 512; i++){
pte_t pte = pagetable[i];
if((pte & PTE_V) && (pte & (PTE_R|PTE_W|PTE_X)) == 0){
// this PTE points to a lower-level page table.
uint64 child = PTE2PA(pte);
freewalk((pagetable_t)child);
pagetable[i] = 0;
} else if(pte & PTE_V){
panic("freewalk: leaf");
}
}
kfree((void*)pagetable);
}
我们可以通过数组的形式去访问页表中的PTE,然后通过标志位知道该PTE是一个最后的物理地址,还是指向下一个页表的PTE.
vmprint
void
vmprint(pagetable_t pagetable,int depth)
{
if(depth == 0)printf("page table %p\n",pagetable);
for(int i = 0 ; i <512 ;i++)
{
pte_t pte = pagetable[i];
if((pte & PTE_V) && (pte & (PTE_R|PTE_W|PTE_X)) == 0 )
{
if(depth == 0)printf(".. %d: pte %p pa %p\n",i,pte,PTE2PA(pte));
if(depth == 1)printf(".. .. %d: pte %p pa %p\n",i,pte,PTE2PA(pte));
uint64 child = PTE2PA(pte);
vmprint((pagetable_t)child,depth+1);
}else if((pte & PTE_V) && (pte & (PTE_R|PTE_W|PTE_X)) != 0)
{
printf(".. .. ..%d: pte %p pa %p\n",i,pte,PTE2PA(pte));
}
}
}
Detecting which pages have been accessed (hard)
Some garbage collectors (a form of automatic memory management) can benefit from information about which pages have been accessed (read or write). In this part of the lab, you will add a new feature to xv6 that detects and reports this information to userspace by inspecting the access bits in the RISC-V page table. The RISC-V hardware page walker marks these bits in the PTE whenever it resolves a TLB miss.
Your job is to implement pgaccess(), a system call that reports which pages have been accessed. The system call takes three arguments. First, it takes the starting virtual address of the first user page to check. Second, it takes the number of pages to check. Finally, it takes a user address to a buffer to store the results into a bitmask (a datastructure that uses one bit per page and where the first page corresponds to the least significant bit). You will receive full credit for this part of the lab if the pgaccess test case passes when running pgtbltest.
Some hints:
- Start by implementing
sys_pgaccess()
inkernel/sysproc.c
. - You’ll need to parse arguments using
argaddr()
andargint()
. - For the output bitmask, it’s easier to store a temporary buffer in the kernel and copy it to the user (via
copyout()
) after filling it with the right bits. - It’s okay to set an upper limit on the number of pages that can be scanned.
walk()
inkernel/vm.c
is very useful for finding the right PTEs.- You’ll need to define
PTE_A
, the access bit, inkernel/riscv.h
. Consult the RISC-V manual to determine its value. - Be sure to clear
PTE_A
after checking if it is set. Otherwise, it won’t be possible to determine if the page was accessed since the last timepgaccess()
was called (i.e., the bit will be set forever). vmprint()
may come in handy to debug page tables.
要求我们做什么:
Your job is to implement pgaccess(), a system call that reports which pages have been accessed.
看看argaddr 和 argint
// Fetch the nth 32-bit system call argument.
int
argint(int n, int *ip)
{
*ip = argraw(n);
return 0;
}
// Retrieve an argument as a pointer.
// Doesn't check for legality, since
// copyin/copyout will do that.
int
argaddr(int n, uint64 *ip)
{
*ip = argraw(n);
return 0;
}
static uint64
argraw(int n)
{
struct proc *p = myproc();
switch (n) {
case 0:
return p->trapframe->a0;
case 1:
return p->trapframe->a1;
case 2:
return p->trapframe->a2;
case 3:
return p->trapframe->a3;
case 4:
return p->trapframe->a4;
case 5:
return p->trapframe->a5;
}
panic("argraw");
return -1;
}
提示说walk(kernel/vm.c)函数会有帮助,这个函数就是返回最终的PTE,如果没有的话且alloc!=0就会创建一个
// Return the address of the PTE in page table pagetable
// that corresponds to virtual address va. If alloc!=0,
// create any required page-table pages.
//
// The risc-v Sv39 scheme has three levels of page-table
// pages. A page-table page contains 512 64-bit PTEs.
// A 64-bit virtual address is split into five fields:
// 39..63 -- must be zero.
// 30..38 -- 9 bits of level-2 index.
// 21..29 -- 9 bits of level-1 index.
// 12..20 -- 9 bits of level-0 index.
// 0..11 -- 12 bits of byte offset within the page.
pte_t *
walk(pagetable_t pagetable, uint64 va, int alloc)
{
if(va >= MAXVA)
panic("walk");
for(int level = 2; level > 0; level--) {
pte_t *pte = &pagetable[PX(level, va)];
if(*pte & PTE_V) {
pagetable = (pagetable_t)PTE2PA(*pte);
} else {
if(!alloc || (pagetable = (pde_t*)kalloc()) == 0)
return 0;
memset(pagetable, 0, PGSIZE);
*pte = PA2PTE(pagetable) | PTE_V;
}
}
return &pagetable[PX(0, va)];
}
测试函数pgtbltest中的pgaccess_test测试
void
pgaccess_test()
{
char *buf;
unsigned int abits;
printf("pgaccess_test starting\n");
testname = "pgaccess_test";
buf = malloc(32 * PGSIZE);
if (pgaccess(buf, 32, &abits) < 0)
err("pgaccess failed");
buf[PGSIZE * 1] += 1;
buf[PGSIZE * 2] += 1;
buf[PGSIZE * 30] += 1;
if (pgaccess(buf, 32, &abits) < 0)
err("pgaccess failed");
if (abits != ((1 << 1) | (1 << 2) | (1 << 30)))
err("incorrect access bits set");
free(buf);
printf("pgaccess_test: OK\n");
}
完成pgaccess
首先在在riscv.h中定义PTE_A,这个是参考了https://www.bilibili.com/video/BV1zL4y1t7mG/?spm_id_from=333.788&vd_source=3beb6fdc520324dc1427808056eb9e4e得到的。
在vm.c中添加函数pgaccess_walk,不要忘了在defs.h中声明,这是得到虚拟地址隐射的物理地址。
pte_t*
pgaccess_walk(pagetable_t pagetable,uint64 va)
{
if(va >= MAXVA)
panic("walk");
for(int level = 2; level > 0; level--) {
pte_t *pte = &pagetable[PX(level, va)];
if(*pte & PTE_V) {
pagetable = (pagetable_t)PTE2PA(*pte);
}
}
return &pagetable[PX(0, va)];
}
最后完成
int
sys_pgaccess(void)
{
// lab pgtbl: your code here.
//先读取3个参数
//第一个参数: 第一个需要检查的用户页面开始的虚拟地址 First, it takes the starting virtual address of the first user page to check.
//第二个参数:需要检测的页的数量 Second, it takes the number of pages to check.
//一个buffer去存储结果 是一个掩码--bitmask Finally, it takes a user address to a buffer to store the results into a bitmask
//(a datastructure that uses one bit per page and where the first page corresponds to the least significant bit).
uint64 pagetableAddr;
int checkNumber;
uint64 buffer;
int ret = 0x0000;
if(argaddr(0,&pagetableAddr)<0)
{
return -1;
}
if(argint(1,&checkNumber)<0)
{
return -1;
}
if(argaddr(2,&buffer)<0)
{
return -1;
}
if(checkNumber > 32 )return -1;
//获得该进程的页表
//将结果写入用户空间
struct proc *p = myproc();
for(int i = checkNumber-1 ;i >=0;i--)
{
pte_t* pte = pgaccess_walk(p->pagetable,pagetableAddr+i*PGSIZE);
ret = (ret<<1);
if(*pte & PTE_A)
{
ret = ret|1;
//将PTE_A重置为0
*pte =(*pte)^PTE_A;
}
}
copyout(p->pagetable, buffer,(char*)&ret,4);
return 0;
}
得到结果
$ pgtbltest
ugetpid_test starting
ugetpid_test: OK
pgaccess_test starting
pgaccess_test: OK
pgtbltest: all tests succeeded
Optional challenge exercises—有时间再回头来做吧
- Use super-pages to reduce the number of PTEs in page tables.
- Unmap the first page of a user process so that dereferencing a null pointer will result in a fault. You will have to start the user text segment at, for example, 4096, instead of 0.
- Add a system call that reports dirty pages (modified pages) using
PTE_D
.