Source-Changes-HG archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[src/trunk]: src/sys/arch Welcome PAE inside i386 current.



details:   https://anonhg.NetBSD.org/src/rev/3580f2c6aab5
branches:  trunk
changeset: 756564:3580f2c6aab5
user:      jym <jym%NetBSD.org@localhost>
date:      Sat Jul 24 00:45:54 2010 +0000

description:
Welcome PAE inside i386 current.

This patch is inspired by work previously done by Jeremy Morse, ported by me
to -current, merged with the work previously done for port-xen, together with
additionals fixes and improvements.

PAE option is disabled by default in GENERIC (but will be enabled in ALL in
the next few days).

In quick, PAE switches the CPU to a mode where physical addresses become
36 bits (64 GiB). Virtual address space remains at 32 bits (4 GiB). To cope
with the increased size of the physical address, they are manipulated as
64 bits variables by kernel and MMU.

When supported by the CPU, it also allows the use of the NX/XD bit that
provides no-execution right enforcement on a per physical page basis.

Notes:

- reworked locore.S

- introduce cpu_load_pmap(), used to switch pmap for the curcpu. Due to the
different handling of pmap mappings with PAE vs !PAE, Xen vs native, details
are hidden within this function. This helps calling it from assembly,
as some features, like BIOS calls, switch to pmap_kernel before mapping
trampoline code in low memory.

- some changes in bioscall and kvm86_call, to reflect the above.

- the L3 is "pinned" per-CPU, and is only manipulated by a
reduced set of functions within pmap. To track the L3, I added two
elements to struct cpu_info, namely ci_l3_pdirpa (PA of the L3), and
ci_l3_pdir (the L3 VA). Rest of the code considers that it runs "just
like" a normal i386, except that the L2 is 4 pages long (PTP_LEVELS is
still 2).

- similar to the ci_pae_l3_pdir{,pa} variables, amd64's xen_current_user_pgd
becomes an element of cpu_info (slowly paving the way for MP world).

- bootinfo_source struct declaration is modified, to cope with paddr_t size
change with PAE (it is not correct to assume that bs_addr is a paddr_t when
compiled with PAE - it should remain 32 bits). bs_addrs is now a
void * array (in bootloader's code under i386/stand/, the bs_addrs
is a physaddr_t, which is an unsigned long).

- fixes in multiboot code (same reason as bootinfo): paddr_t size
change. I used Elf32_* types, use RELOC() where necessary, and move the
memcpy() functions out of the if/else if (I do not expect sym and str tables
to overlap with ELF).

- 64 bits atomic functions for pmap

- all pmap_pdirpa access are now done through the pmap_pdirpa macro. It
hides the L3/L2 stuff from PAE, as well as the pm_pdirpa change in
struct pmap (it now becomes a PDP_SIZE array, with or without PAE).

- manipulation of recursive mappings ( PDIR_SLOT_{,A}PTEs ) is done via
loops on PDP_SIZE.

See also http://mail-index.netbsd.org/port-i386/2010/07/17/msg002062.html

No objection raised on port-i386@ and port-xen@R for about a week.

XXX kvm(3) will be fixed in another patch to properly handle both PAE and !PAE
kernel dumps (VA => PA macros are slightly different, and need proper 64 bits
PA support in kvm_i386).

XXX Mixing PAE and !PAE modules may lead to unwanted/unexpected results. This
cannot be solved easily, and needs lots of thinking before being declared
safe (paddr_t/bus_addr_t size handling, PD/PT macros abstractions).

diffstat:

 sys/arch/i386/conf/GENERIC     |    5 +-
 sys/arch/i386/i386/bioscall.S  |   21 ++--
 sys/arch/i386/i386/kvm86call.S |   24 +++--
 sys/arch/i386/i386/locore.S    |  106 ++++++++++++++++++--------
 sys/arch/i386/i386/machdep.c   |   23 ++++-
 sys/arch/i386/i386/mptramp.S   |   10 ++-
 sys/arch/i386/i386/multiboot.c |   51 +++++-------
 sys/arch/i386/include/pmap.h   |   96 ++++++++++++++++--------
 sys/arch/x86/include/cpu.h     |   15 +++-
 sys/arch/x86/include/pmap.h    |   14 +--
 sys/arch/x86/x86/cpu.c         |   45 ++++++++++-
 sys/arch/x86/x86/pmap.c        |  162 +++++++++++++++++++++-------------------
 sys/arch/xen/x86/cpu.c         |   65 +++++++++++++++-
 sys/arch/xen/x86/x86_xpmap.c   |   24 +++--
 sys/arch/xen/x86/xenfunc.c     |    8 +-
 15 files changed, 438 insertions(+), 231 deletions(-)

diffs (truncated from 1347 to 300 lines):

diff -r ee38113f16f6 -r 3580f2c6aab5 sys/arch/i386/conf/GENERIC
--- a/sys/arch/i386/conf/GENERIC        Fri Jul 23 22:31:35 2010 +0000
+++ b/sys/arch/i386/conf/GENERIC        Sat Jul 24 00:45:54 2010 +0000
@@ -1,4 +1,4 @@
-# $NetBSD: GENERIC,v 1.988 2010/07/23 00:43:20 jakllsch Exp $
+# $NetBSD: GENERIC,v 1.989 2010/07/24 00:45:54 jym Exp $
 #
 # GENERIC machine description file
 #
@@ -22,7 +22,7 @@
 
 options        INCLUDE_CONFIG_FILE     # embed config file in kernel binary
 
-#ident                 "GENERIC-$Revision: 1.988 $"
+#ident                 "GENERIC-$Revision: 1.989 $"
 
 maxusers       64              # estimated number of users
 
@@ -35,6 +35,7 @@
 # CPU-related options.
 options        VM86            # virtual 8086 emulation
 options        USER_LDT        # user-settable LDT; used by WINE
+#options       PAE             # PAE mode (36 bits physical addressing)
 
 # Enhanced SpeedStep Technology in the Pentium M
 options        ENHANCED_SPEEDSTEP
diff -r ee38113f16f6 -r 3580f2c6aab5 sys/arch/i386/i386/bioscall.S
--- a/sys/arch/i386/i386/bioscall.S     Fri Jul 23 22:31:35 2010 +0000
+++ b/sys/arch/i386/i386/bioscall.S     Sat Jul 24 00:45:54 2010 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: bioscall.S,v 1.8 2008/04/28 20:23:24 martin Exp $ */
+/*     $NetBSD: bioscall.S,v 1.9 2010/07/24 00:45:54 jym Exp $ */
 
 /*-
  * Copyright (c) 1997 The NetBSD Foundation, Inc.
@@ -30,7 +30,7 @@
  */
 
 #include <machine/asm.h>
-__KERNEL_RCSID(0, "$NetBSD: bioscall.S,v 1.8 2008/04/28 20:23:24 martin Exp $");
+__KERNEL_RCSID(0, "$NetBSD: bioscall.S,v 1.9 2010/07/24 00:45:54 jym Exp $");
 
 #include <machine/bioscall.h>
 
@@ -39,8 +39,6 @@
 /* LINTSTUB: include <sys/types.h> */
 /* LINTSTUB: include <machine/bioscall.h> */
 
-       .globl  _C_LABEL(PDPpaddr)      /* from locore.S */
-
        .section ".rodata"
 _C_LABEL(biostramp_image):
        .globl  _C_LABEL(biostramp_image)
@@ -69,11 +67,11 @@
        pushl   %ebp
        movl    %esp,%ebp               /* set up frame ptr */
 
-       movl    %cr3,%eax               /* save PDP base register */
+       /* install lwp0 pmap */
+       movl    _C_LABEL(kernel_pmap_ptr),%eax
        pushl   %eax
-
-       movl    _C_LABEL(PDPpaddr),%eax /* install proc0 PDP */
-       movl    %eax,%cr3
+       call    _C_LABEL(cpu_load_pmap)
+       addl    $4,%esp
 
        movl    $(BIOSTRAMP_BASE),%eax  /* address of trampoline area */
        pushl   12(%ebp)
@@ -81,8 +79,11 @@
        call    *%eax                   /* machdep.c initializes it */
        addl    $8,%esp                 /* clear args from stack */
 
-       popl    %eax
-       movl    %eax,%cr3                       /* restore PTDB register */
+       /* restore pmap - saved value is in curcpu()->ci_pmap */
+       movl    %fs:(CPU_INFO_PMAP),%eax
+       pushl   %eax
+       call    _C_LABEL(cpu_load_pmap)
+       addl    $4,%esp
 
        leave
        ret
diff -r ee38113f16f6 -r 3580f2c6aab5 sys/arch/i386/i386/kvm86call.S
--- a/sys/arch/i386/i386/kvm86call.S    Fri Jul 23 22:31:35 2010 +0000
+++ b/sys/arch/i386/i386/kvm86call.S    Sat Jul 24 00:45:54 2010 +0000
@@ -1,4 +1,4 @@
-/* $NetBSD: kvm86call.S,v 1.9 2008/01/04 15:55:31 yamt Exp $ */
+/* $NetBSD: kvm86call.S,v 1.10 2010/07/24 00:45:54 jym Exp $ */
 
 /*-
  * Copyright (c) 1998 Jonathan Lemon
@@ -34,7 +34,7 @@
 
 #include "assym.h"
 
-__KERNEL_RCSID(0, "$NetBSD: kvm86call.S,v 1.9 2008/01/04 15:55:31 yamt Exp $");
+__KERNEL_RCSID(0, "$NetBSD: kvm86call.S,v 1.10 2010/07/24 00:45:54 jym Exp $");
 
        .data
        .align 4
@@ -79,10 +79,7 @@
        andl    $~0x0200,4(%eax,%edi,1) /* reset "task busy" */
        ltr     %di
 
-       movl    %cr3,%eax
-       pushl   %eax                    /* save address space */
-       movl    PDPpaddr,%ecx
-       movl    %ecx,%ebx
+       movl    _C_LABEL(PDPpaddr),%ebx
        addl    $KERNBASE,%ebx          /* va of Idle PDP */
        movl    0(%ebx),%eax
        pushl   %eax                    /* old pde */
@@ -93,7 +90,12 @@
        movl    vm86newptd,%eax         /* mapping for vm86 page table */
        movl    %eax,0(%ebx)            /* ... install as PDP entry 0 */
 
-       movl    %ecx,%cr3               /* new page tables */
+       /* install Idle pmap (lwp0 pmap) */
+       movl    _C_LABEL(kernel_pmap_ptr),%eax
+       pushl   %eax
+       call    _C_LABEL(cpu_load_pmap)
+       addl    $4,%esp
+
        movl    vm86frame,%esp          /* switch to new stack */
 
        movl    $1,kvm86_incall         /* set flag for trap() */
@@ -129,8 +131,12 @@
        popl    %ebx                    /* saved va of Idle PDP */
        popl    %eax
        movl    %eax,0(%ebx)            /* restore old pde */
-       popl    %eax
-       movl    %eax,%cr3               /* install old page table */
+
+       /* restore pmap - saved value is in curcpu()->ci_pmap */
+       movl    %fs:(CPU_INFO_PMAP),%eax
+       pushl   %eax
+       call    _C_LABEL(cpu_load_pmap)
+       addl    $4,%esp
 
        movl    $0,kvm86_incall         /* reset trapflag */
 
diff -r ee38113f16f6 -r 3580f2c6aab5 sys/arch/i386/i386/locore.S
--- a/sys/arch/i386/i386/locore.S       Fri Jul 23 22:31:35 2010 +0000
+++ b/sys/arch/i386/i386/locore.S       Sat Jul 24 00:45:54 2010 +0000
@@ -1,4 +1,4 @@
-/*     $NetBSD: locore.S,v 1.92 2010/07/15 18:55:27 jym Exp $  */
+/*     $NetBSD: locore.S,v 1.93 2010/07/24 00:45:54 jym Exp $  */
 
 /*
  * Copyright-o-rama!
@@ -129,7 +129,7 @@
  */
 
 #include <machine/asm.h>
-__KERNEL_RCSID(0, "$NetBSD: locore.S,v 1.92 2010/07/15 18:55:27 jym Exp $");
+__KERNEL_RCSID(0, "$NetBSD: locore.S,v 1.93 2010/07/24 00:45:54 jym Exp $");
 
 #include "opt_compat_oldboot.h"
 #include "opt_ddb.h"
@@ -482,29 +482,43 @@
        movl    $_RELOC(tmpstk),%esp    # bootstrap stack end location
 
 /*
- * Virtual address space of kernel:
+ * Virtual address space of kernel, without PAE. The page dir is 1 page long.
  *
  * text | data | bss | [syms] | [blobs] | page dir | proc0 kstack | L1 ptp
  *                                     0          1       2      3
+ *
+ * Virtual address space of kernel, with PAE. We need 4 pages for the page dir
+ * and 1 page for the L3.
+ * text | data | bss | [syms] | [blobs] | L3 | page dir | proc0 kstack | L1 ptp
+ *                                     0    1          5       6      7
  */
+#ifndef PAE
+#define        PROC0_PDIR_OFF  0
+#else
+#define PROC0_L3_OFF   0
+#define PROC0_PDIR_OFF 1 * PAGE_SIZE
+#endif
 
-#define        PROC0_PDIR_OFF  0
-#define        PROC0_STK_OFF   (PROC0_PDIR_OFF + PAGE_SIZE)
+#define        PROC0_STK_OFF   (PROC0_PDIR_OFF + PDP_SIZE * PAGE_SIZE)
 #define        PROC0_PTP1_OFF  (PROC0_STK_OFF + UPAGES * PAGE_SIZE)
 
 /*
- * fillkpt
+ * fillkpt - Fill in a kernel page table
  *     eax = pte (page frame | control | status)
  *     ebx = page table address
  *     ecx = number of pages to map
+ * 
+ * For PAE, each entry is 8 bytes long: we must set the 4 upper bytes to 0.
+ * This is done by the first instruction of fillkpt. In the non-PAE case, this
+ * instruction just clears the page table entry.
  */
 
 #define fillkpt        \
-1:     movl    %eax,(%ebx)     ;       /* store phys addr */ \
-       addl    $4,%ebx         ;       /* next pte/pde */ \
-       addl    $PAGE_SIZE,%eax ;       /* next phys page */ \
-       loop    1b              ;  \
-
+1:     movl    $0,(PDE_SIZE-4)(%ebx)   ;       /* clear bits */        \
+       movl    %eax,(%ebx)             ;       /* store phys addr */   \
+       addl    $PDE_SIZE,%ebx          ;       /* next pte/pde */      \
+       addl    $PAGE_SIZE,%eax         ;       /* next phys page */    \
+       loop    1b                      ;
 
        /* Find end of kernel image. */
        movl    $RELOC(end),%edi
@@ -538,9 +552,14 @@
        incl    %eax            /* one more ptp for VAs stolen by bootstrap */
 1:     movl    %eax,RELOC(nkptp)+1*4
 
-       /* tablesize = (1 + UPAGES + nkptp) << PGSHIFT; */
-       addl    $(1+UPAGES),%eax
+       /* tablesize = (PDP_SIZE + UPAGES + nkptp) << PGSHIFT; */
+       addl    $(PDP_SIZE+UPAGES),%eax
+#ifdef PAE
+       incl    %eax            /* one more page for the L3 PD */
+       shll    $PGSHIFT+1,%eax /* PTP tables are twice larger with PAE */
+#else
        shll    $PGSHIFT,%eax
+#endif
        movl    %eax,RELOC(tablesize)
 
        /* ensure that nkptp covers bootstrap tables */
@@ -578,7 +597,10 @@
         */
        movl    $_RELOC(KERNTEXTOFF),%eax
        movl    %eax,%ecx
-       shrl    $(PGSHIFT-2),%ecx       /* ((n >> PGSHIFT) << 2) for # pdes */
+       shrl    $(PGSHIFT-2),%ecx       /* ((n >> PGSHIFT) << 2) for # pdes */
+#ifdef PAE
+       shll    $1,%ecx                 /* pdes are twice larger with PAE */
+#endif
        addl    %ecx,%ebx
 
        /* Map the kernel text read-only. */
@@ -605,36 +627,51 @@
  * Construct a page table directory.
  */
        /* Set up top level entries for identity mapping */
-       leal    (PROC0_PDIR_OFF)(%esi),%ebx
+       leal    (PROC0_PDIR_OFF)(%esi),%ebx
        leal    (PROC0_PTP1_OFF)(%esi),%eax
        orl     $(PG_V|PG_KW), %eax
        movl    RELOC(nkptp)+1*4,%ecx
        fillkpt
 
        /* Set up top level entries for actual kernel mapping */
-       leal    (PROC0_PDIR_OFF + L2_SLOT_KERNBASE*4)(%esi),%ebx
+       leal    (PROC0_PDIR_OFF + L2_SLOT_KERNBASE*PDE_SIZE)(%esi),%ebx
        leal    (PROC0_PTP1_OFF)(%esi),%eax
        orl     $(PG_V|PG_KW), %eax
        movl    RELOC(nkptp)+1*4,%ecx
        fillkpt
 
        /* Install a PDE recursively mapping page directory as a page table! */
-       leal    (PROC0_PDIR_OFF + PDIR_SLOT_PTE*4)(%esi),%ebx
-       leal    (PROC0_PDIR_OFF)(%esi),%eax
+       leal    (PROC0_PDIR_OFF + PDIR_SLOT_PTE*PDE_SIZE)(%esi),%ebx
+       leal    (PROC0_PDIR_OFF)(%esi),%eax
        orl     $(PG_V|PG_KW),%eax
-       movl    %eax,(%ebx)
- 
+       movl    $PDP_SIZE,%ecx
+       fillkpt
+
+#ifdef PAE
+       /* Fill in proc0 L3 page with entries pointing to the page dirs */
+       leal    (PROC0_L3_OFF)(%esi),%ebx
+       leal    (PROC0_PDIR_OFF)(%esi),%eax
+       orl     $(PG_V),%eax
+       movl    $PDP_SIZE,%ecx
+       fillkpt
+
+       /* Enable PAE mode */
+       movl    %cr4,%eax
+       orl     $CR4_PAE,%eax
+       movl    %eax,%cr4
+#endif
 
        /* Save phys. addr of PDP, for libkvm. */
-       movl    %esi,RELOC(PDPpaddr)
+       leal    (PROC0_PDIR_OFF)(%esi),%eax
+       movl    %eax,RELOC(PDPpaddr)
 
-       /*
-        * Startup checklist:
-        * 1. Load %cr3 with pointer to PDIR.
-        */
+       /*
+        * Startup checklist:
+        * 1. Load %cr3 with pointer to PDIR (or L3 PD page for PAE).
+        */
        movl    %esi,%eax               # phys address of ptd in proc 0



Home | Main Index | Thread Index | Old Index