NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-xen/53267 (XEN amd64 DomU (Dom0??) NetBSD current USERMODE() fails)



On May 11,  5:38pm, Robert Elz wrote:
} 
} [snip]
} macro to work anything like the x86 one does - it does not need to
} inspect bits in %cs (whatever that is).   Other ports do not.   Any method
} that tells what the mode was before the current interrupt/trap will work.

     %cs is the code segment register.  It comes from the days of
the 8086/8088 (both processors have 16-bit registers, but the 8088
has an 8-bit bus interface requiring two bus cycles to load a
register).  At that time, it would be loaded with the physical
address of where the code segment started in memory.  It was shifted
left four bits and %pc (program counter) was added to get the memory
location of the next instruction.  These were 16-bit registers thus
creating an address space of 1 MB.  Modern processors still startup
in what is called "real mode" which is basically emulating an
8086/8088 (called "real mode" as the MMU was disabled).  OS startup
code is responsible for switching the processor to 32-bit/64-bit
mode as appropriate.  UEFI changes this and switches the processor
to 32-bit/64-bit mode early on.  An UEFI boot loader is actually
an UEFI application and doesn't run on bare metal (UEFI even has
a processor independent byte code as an option).

    In the 80286 world, %cs (and its cousins) became segment selector
registers.  In that world, %cs pointed at entries in a table
containing segment descriptors.  The table contained such things
as base, length, and permissions for segments.  The 80286 was a
strictly segmented architecture with no paging capability, but did
have memory protection (aka a supervisor mode).

     80286 introduced rings or privilege levels.  Originally there
were four.  The bottom two bits of a segment register indicated
the desired privilege level.  0 was maximum privilege and 3 was
least privilege.  When you loaded a segment register these bits
would be compared with the DPL (Descriptor Privilege Level) in the
segment descriptor.  The processor wouldn't let you access a segment
with a higher DPL without going through some kind of access method
of which there were multiple (trap, software interrupt, call gate,
etc.).  Most OSes run the kernel in ring 0 and userland in ring 3.

     The same basic scheme continued with the 80386.  However,
registers were extended to 32-bits and paging hardware was added.
It still used segments.  Mapping from an address in a program goes
through the segmentation hardware then the paging hardware which
maps from virtual addresses to physical addresses.  Most modern
OSes set the segment base to 0 and the length to all of memory thus
effectively taking segmentation out of the picture.

     x86-64 (which goes under a variety of names) further extended
the registers to 64-bits.  64-bit mode, formally known as "long
mode", drops a bunch of legacy stuff including segmentation.
However, I don't know as much about x86-64 as I do about the earlier
modes.

     AMD noted that rings 1 and 2 weren't used much and dumped them
as unused legacy stuff.  This is what caused problems for Xen.  On
i386, Xen runs the hypervisor in ring 0, the OS kernel in ring 1,
and the OS userland in ring 3 (this gave the hypervisor protection
from the guest OS while also giving the guest OS kernel protection
from its userland).  On x86-64 this is no longer possible.  So,
the hypervisor runs in ring 0 and the OS runs in ring 3.  The OS
must trap to the hypervisor to switch between "supervisor" and
"user" mode.  Without this mechanism the guest OS kernel wouldn't
have any protection from its userland.  I don't know the details
of this mechanism but it is apparent that it is involved in the
problem you observed.

     Having to trap to the hypervisor does slow down 64-bit OSes.
It is part of the impetus behind PVH mode.  In that mode, the OS
basically thinks it's running on bare metal and has access to ring
0 and ring 3.  However, it knows to trap to the hypervisor for I/O.
HVM mode runs unmodified OSes and emulates hardware which requires
a trap to the hypervisor for every I/O instruction (most I/O
operations require numerous I/O instructions).

     This is probably much more then you wanted to know about the
low level details of x86 processors.  :-)  Also, it has been many
many years since I've written significant amounts of x86 assembly
language code, so some of the details might be wrong, but the gist
should be correct.

}-- End of excerpt from Robert Elz


Home | Main Index | Thread Index | Old Index