Subject: Re: kernel stack overflow on netbsd-1-6 branch
To: SAITOH Masanobu <masanobu@iij.ad.jp>
From: David Laight <david@l8s.co.uk>
List: port-i386
Date: 11/26/2002 13:55:30
> panic: trap on DR0: maybe kernel stack overflow
> 
> Stopped in pid 1 (init) at      cpu_Debugger+0x4:       leave
> db> trace
> cpu_Debugger(c106cb00,e325a000,5,0,0) at cpu_Debugger+0x4
> 32: panic(c0506380,0,0,0,c106cb00) at panic+0xad
> 80: trap() at trap+0x185
> --- trap (number 5) ---
> 152: pmap_extract(c06487e0,c106cb00,c8f12000,2000,0) at pmap_extract+0x1
> 80: _bus_dmamap_load(c06487e0,c106cb00,c8f12000,2000,0,109,6,0) at _bus_dmamap_load+0x4f
...
> 400: sys_execve(e325a000,e3259f80,e3259f70,0) at sys_execve+0xe4
> 112: start_init(e325a000) at start_init+0x1c5

I'd dump the stack pointer, not the frame size.
Might show if it is splurious.

Adding that lot up gives about 3k of stack.
sizeof (struct user) is about 1k.
USPACE is 2 pages or 8k (4 pages if NOREDZONE isn't defined)

So something is awry somewhere.

A bit of inspection with ddb on a current GENERIC(ish) kernel shows
that the kernel stack starts 16k above the user area and heads down
towards it.

I'm not exactly sure which access debug register dr0 is set to
trap on - but it will only trap on a 4 byte range so isn't a
fat lot of use for checking stack overflow.

I also suspect that if NOREDZONE is defined, it is set to trap
on writes to the start of struct user, not 2 pages into a 4 page
stack.

This stack overflow detect code looks badly stuffed :-)
I would have:
- switched 'struct user' to the upper end of the allocated pages
- allocated 3 pages of address space, but only mapped physical
  memory to the latter 2.
- if the kernel traps on the third (ie lowest) page report a
  stack overflow.
  (this might mean having a page of physical memory reserved.
  - I presume the x86 interrupt scheme allows for faults
  on the kernel stack...

	David

-- 
David Laight: david@l8s.co.uk