NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)



The following reply was made to PR port-xen/53267; it has been noted by GNATS.

From: John Nemeth <jnemeth%cue.bc.ca@localhost>
To: Robert Elz <kre%munnari.OZ.AU@localhost>, "Cherry G.Mathew" <cherry%zyx.in@localhost>
Cc: gnats-bugs%NetBSD.org@localhost, netbsd-bugs%netbsd.org@localhost
Subject: Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)
Date: Sat, 12 May 2018 00:36:25 -0700

 On May 11,  5:38pm, Robert Elz wrote:
 } 
 } [snip]
 } macro to work anything like the x86 one does - it does not need to
 } inspect bits in %cs (whatever that is).   Other ports do not.   Any method
 } that tells what the mode was before the current interrupt/trap will work.
 
      %cs is the code segment register.  It comes from the days of
 the 8086/8088 (both processors have 16-bit registers, but the 8088
 has an 8-bit bus interface requiring two bus cycles to load a
 register).  At that time, it would be loaded with the physical
 address of where the code segment started in memory.  It was shifted
 left four bits and %pc (program counter) was added to get the memory
 location of the next instruction.  These were 16-bit registers thus
 creating an address space of 1 MB.  Modern processors still startup
 in what is called "real mode" which is basically emulating an
 8086/8088 (called "real mode" as the MMU was disabled).  OS startup
 code is responsible for switching the processor to 32-bit/64-bit
 mode as appropriate.  UEFI changes this and switches the processor
 to 32-bit/64-bit mode early on.  An UEFI boot loader is actually
 an UEFI application and doesn't run on bare metal (UEFI even has
 a processor independent byte code as an option).
 
     In the 80286 world, %cs (and its cousins) became segment selector
 registers.  In that world, %cs pointed at entries in a table
 containing segment descriptors.  The table contained such things
 as base, length, and permissions for segments.  The 80286 was a
 strictly segmented architecture with no paging capability, but did
 have memory protection (aka a supervisor mode).
 
      80286 introduced rings or privilege levels.  Originally there
 were four.  The bottom two bits of a segment register indicated
 the desired privilege level.  0 was maximum privilege and 3 was
 least privilege.  When you loaded a segment register these bits
 would be compared with the DPL (Descriptor Privilege Level) in the
 segment descriptor.  The processor wouldn't let you access a segment
 with a higher DPL without going through some kind of access method
 of which there were multiple (trap, software interrupt, call gate,
 etc.).  Most OSes run the kernel in ring 0 and userland in ring 3.
 
      The same basic scheme continued with the 80386.  However,
 registers were extended to 32-bits and paging hardware was added.
 It still used segments.  Mapping from an address in a program goes
 through the segmentation hardware then the paging hardware which
 maps from virtual addresses to physical addresses.  Most modern
 OSes set the segment base to 0 and the length to all of memory thus
 effectively taking segmentation out of the picture.
 
      x86-64 (which goes under a variety of names) further extended
 the registers to 64-bits.  64-bit mode, formally known as "long
 mode", drops a bunch of legacy stuff including segmentation.
 However, I don't know as much about x86-64 as I do about the earlier
 modes.
 
      AMD noted that rings 1 and 2 weren't used much and dumped them
 as unused legacy stuff.  This is what caused problems for Xen.  On
 i386, Xen runs the hypervisor in ring 0, the OS kernel in ring 1,
 and the OS userland in ring 3 (this gave the hypervisor protection
 from the guest OS while also giving the guest OS kernel protection
 from its userland).  On x86-64 this is no longer possible.  So,
 the hypervisor runs in ring 0 and the OS runs in ring 3.  The OS
 must trap to the hypervisor to switch between "supervisor" and
 "user" mode.  Without this mechanism the guest OS kernel wouldn't
 have any protection from its userland.  I don't know the details
 of this mechanism but it is apparent that it is involved in the
 problem you observed.
 
      Having to trap to the hypervisor does slow down 64-bit OSes.
 It is part of the impetus behind PVH mode.  In that mode, the OS
 basically thinks it's running on bare metal and has access to ring
 0 and ring 3.  However, it knows to trap to the hypervisor for I/O.
 HVM mode runs unmodified OSes and emulates hardware which requires
 a trap to the hypervisor for every I/O instruction (most I/O
 operations require numerous I/O instructions).
 
      This is probably much more then you wanted to know about the
 low level details of x86 processors.  :-)  Also, it has been many
 many years since I've written significant amounts of x86 assembly
 language code, so some of the details might be wrong, but the gist
 should be correct.
 
 }-- End of excerpt from Robert Elz
 


Home | Main Index | Thread Index | Old Index