[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Silly question - a further one : TLB flush
According to a recent article published on appleinsider.com, Unises'
i386 kernels, although they can address the integrality of the 4 GiB
memory space, both in user- and kernel mode, suffer from a TLB cache
penalty each time a system call (trap) or interrupt is entered in,
because the OS has to switch the MMU between user- and kernel VM
space, and thus flush the TLB before reloading it. This penalty is not
incured by 64-bits kernels, because the whole VM space can safely be
divided between user and kernel space without any overlap, and thus
the TLB can hold "global" VM page translation infos (what
32bit-Windows versions actually do, limiting user memory space to 2 or
3 GiB at most).
Could you provide me with a link to this article please?
What follows should be reviewed by gurus; but that's what I understood
while browsing through netbsd's code. It may contain mistakes or
For i386, netbsd uses a 3GB/1GB memory split and a flat address space
(the descriptor table is loaded with a segment starting at address 0 and
ending at the 4GB boundary). As a consequence, the MMU contains both
user and kernel mappings, which do not require a TLB flush when
If you want to see what the memory layout is for a specific port, you
should read the comments in pmap. They explain it all. IIRC, for
windows, they use a 2GB/2GB memory split.
What you are describing happens for amd64 OSs running under Xen though.
x86_64 removed the concept of segmentation, and left two rings, a
privileged and unprivileged one, while i386 provides 4 ring levels.
Since the hypervisor typically runs in privileged ring (aka ring 0), the
guest OS is put in the unprivileged one (ring 3), both user and kernel
space. So, you may have to update protections between user and kernel,
which requires local TLB flushes.
Note that increasing the size and numbers of registers does have
penalties, as you have to store them somewhere (stack) when context
switching. This penalizes microkernels based OS, since you are
frequently switching, compared to bigger, monolith ones.
This must be benchmarked though, as it is pure speculation from me. I
have never run such a performance assessment myself.
Main Index |
Thread Index |