Subject: Re: maxusers and its progeny
To: David Paul Zimmerman <dpz@apple.com>
From: None <Chris_G_Demetriou@NIAGARA.NECTAR.CS.CMU.EDU>
List: port-alpha
Date: 01/05/1996 02:26:15
> Using the stock 951220 code and an unexceptional config file, a maxusers 
> of 32 gets me a panic "pmap_enter_ptpage: can't get KPT page" just after 
> loading the kernel.  As I mentioned, much higher maxusers cause it to 
> simply hang around that same point.

can't _get_ KPT page?

i've seen "PT page not entered", but not that one.  "eek."
(btw: i think i may have fixed the bug that caused the "PT page not
entered" panic...  if you're seeing _that_, tell me...  8-)

> Apparently the live version of what I pseudocoded, while in fact letting 
> the system successfully boot with a maxusers of 250, causes it to panic 
> not a long time later in vm/vm_kern.c/kmem_malloc() with "kmem_malloc: 
> kmem_map too small".  I've included the diffs at the end of this message 
> for humor value if nothing else (the kernel config file has a line 
> "options SWAP_RATIO=4").

i could believe this.

> Having spent younger time in Berkeley kernels playing with tty and IP 
> code, this is new territory for me, so please bear with me.

"The VM system is from Pittsburgh.  It's scary."  8-)

> Am I under a 
> misconception thinking that the necessary number of KPT entries is 
> realistically bounded by available physical+virtual memory?  Or am I just 
> making too much of a coding goof, maybe just missed a related case that 
> needs a bit 'o rewriting too?

yes, i think you're missing something: shared data.

it's possible to have enough shared data, even writable that this is a
problem.

for instance, say you have a program that allocates a 20M virtual
space (however), touches all the pages to make them valid, maps it
write-only (*) then forks N times.

even though you'll only have 20M of real pages (in RAM or swap)
allocated (plus pages for kstacks, etc.), you'll end up needing PTEs
to map 20M * N of address space.

* -- it doesn't even have to map it writable, with the current VM
system.  you can allocate more virtual space than RAM and swap can
support, and if you do, and try to _use_ more than RAM and swap
provides, your system will hang.  This is a known bug in the
machine-independent VM code in NetBSD.


The real problem with the alpha port and kernel VM is that the Alpha
pmap code wants to have the page tables for all user processes
available in kernel memory, linearly mapped, etc., etc., etc.  This is
a shortcoming in the pmap code, and is directly attributable to the
fact that the current pmap code is a hack, barely modified from the
hp300 pmap.  ("hey, but it works."  8-)

This shortcoming is also partially responsible for the fact that user
processes are only given 8G of user address space, not the much
greater amount that the Alpha architecture allows.  The kernel
shouldn't need to be able to map all of the user page tables into
kernel space, in this way.

If you'd like to work on a fix -- it's very non-trivial to fix -- tell
me and i'll do what i can to help you.


> One more question -- I've found the derivations of the numbers you 
> mentioned except for the 8GB.  Is that an explicit or implicit limit or 
> limitation?  It certainly doesn't fit in a 32 bit int, so I'm curious.

On Alphas with an 8k page size (all of the existing ones, but the
architecture allows page size to be CPU-specific, any power of two up
to 64k):

	The whole of the first-level page table maps 8Tb of virtual space,

	One second-level page table page (and therefore one
	first-level PTE) maps 8Gb of virtual space,

	One third-level page table page (and therefore one
	second-level PTE) maps 8Mb of virtual space,

	One data page is 8Kb of space (both physical and virtual),
	and therefore one third-level PTE maps 8Kb of virtual space.


In NetBSD/Alpha, both user- and kernel- virtual memory spaces are
limited to 8Gb (one level-1 PTE) each.

Why?  because if you do this:
	(1) it's simple to do context switches, and
	(2) the Alpha memory management unit looks an AWFUL lot
	    like an mc68881.  In fact, the only real difference
	    is that the mc68881 provides referenced and modified bits
	    in hardware, but the alpha has to emulate them (using the
	    fault-on-{read,write,execute} bits) in software.

You'll note that the current Alpha pmap was created in less than a day
by taking the hp300 pmap (look at the comments), doing a few global
substitutes, running it though unifdef, and changing a few things here
and there to clean it up, make it compile, and wedge it into the
kernel.  In fact, in my initial pmap code (which i ran for more than a
year w/o noticing!) I _didn't_ do the modify/reference bit emulation,
and so paging would have been... a bad thing.  Thankfully, i had
enough RAM.


As noted previously, the pmap code needs a really, really big cleanup,
or, better yet, needs to be rewritten.


chris