Subject: port-i386/34186: mapping of msgbuf during startup may map invalid physical adresses
To: None <port-i386-maintainer@netbsd.org, gnats-admin@netbsd.org,>
From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock@nagler-company.com>
List: netbsd-bugs
Date: 08/11/2006 14:20:01
>Number:         34186
>Category:       port-i386
>Synopsis:       mapping of msgbuf during startup may map invalid physical adresses
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-i386-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 11 14:20:00 +0000 2006
>Originator:     Wolfgang Stukenbrock
>Release:        NetBSD 3.0.1, current
>Organization:
Dr. Nagler & Company GmbH
	
>Environment:
	
	
System: NetBSD s012 2.1 NetBSD 2.1 (NSW-S012) #10: Mon Dec 12 12:03:54 CET 2005 wgstuken@s011:/export/netbsd-2.1/usr/src/sys/arch/i386/compile/NSW-S012 i386
Architecture: i386
Machine: i386
>Description:
	If the last available physical memeory segment on a system is less 16k, than the
	startup code that will map the kernel message buffer, will fail and map physical
	pages behind the last segment.
	This may either only lead to a message buffer without physical memory behind it,
	or to an overlapping message buffer with something else.

	Analyses of the problem on our Intel D945 board:

	The system will get the following five available memory segments from the BIOS:
	(remark: the avail coloumn means the corresponding component of the structure)

	start		end		avail
	1000		3f07c		3707c
	919		1000		1000
	4		9f		9f
	3f6e9		3f6ed		3f6ed
	3f6ff		3f700		3f700

	The startup code in machdep.c will try to get the 16k from the last segment,
	determines that there are only 4k available, prints out a warning and setup the
	variable msgbuf_paddr the theese 4k.
	Later in cpu_startup() this address is used to map 16k to it. BUG!

	Now there are two ways to correct this problem.
	1. force cpu_startup() to map only the detected size
	2. allow multiple segments to map the message buffer of the requested size.

	Both sollution are not perfect, because you either get a realy small message
	buffer on some motherboards, the second one lead to an unknown number of segments
	at compile time.
	The best sollution is in the middle, I think.

	The fix added below will add a segment list with a static, but I hope via kernel
	configuration selectable, size that will be filled up in a loop (implemented by a
	goto back to the start of the block) that will fill up the segment list until no
	space is available. If this is still no enougth the warning is printer as before.
	The function cpu_startup() will determine the size of the message buffer from the
	segment list and will map the allocated pages to the allocated virtual adressspace
	with the size determined from the segment list.
	The default segment list size is set to 2. That should be good for most cases.

	In the file pmap.c there seems to be some dead code for a previous message buffer
	allocation method. The allocated virtual address for the message buffer in pmap.c
	is never used and will always be overwritten by the allocation in cpu_startup().
	My patch will remove theese old fragments.

	The patch has been tested and it works fine.
	Remark: I've done the test only on a sigle processor machine, because I've no
		multiprocessor available at the moment.

	Remark: The code in the port-amd looks nearly the same as for i386.
	  The only difference I've located ist the fact, that something is done with
	  PTE's in pmap.c and I'm not shure what it will exactly do.
	  Please forward this bugreport to the port-amd maintainer. Thanks
>How-To-Repeat:
	just get a motherboard that will supply a last available memory segment less 16k.
>Fix:
	Here comes the ouput of "rcsdiff -c *.c" for the files /usr/src/sys/arch/i386/i386/pmap.c
	and /usr/src/sys/arch/i386/i386/machdep.c.


*** machdep.c	2006/08/11 09:40:35	1.1
--- machdep.c	2006/08/11 12:05:17
***************
*** 238,245 ****
  
  int	tmx86_has_longrun;
  
  vaddr_t	msgbuf_vaddr;
! paddr_t msgbuf_paddr;
  
  vaddr_t	idt_vaddr;
  paddr_t	idt_paddr;
--- 238,252 ----
  
  int	tmx86_has_longrun;
  
+ #ifndef MSGBUF_MAX_SEG
+ #define MSGBUF_MAX_SEG 2
+ #endif
  vaddr_t	msgbuf_vaddr;
! struct {
! 	paddr_t paddr;
! 	psize_t sz;
! } msgbuf_p_seg[MSGBUF_MAX_SEG];
! unsigned int msgbuf_p_cnt = 0;
  
  vaddr_t	idt_vaddr;
  paddr_t	idt_paddr;
***************
*** 283,303 ****
  void
  cpu_startup()
  {
! 	int x;
  	vaddr_t minaddr, maxaddr;
  	char pbuf[9];
  
  	/*
  	 * Initialize error message buffer (et end of core).
  	 */
! 	msgbuf_vaddr = uvm_km_valloc(kernel_map, x86_round_page(MSGBUFSIZE));
  	if (msgbuf_vaddr == 0)
  		panic("failed to valloc msgbuf_vaddr");
  
  	/* msgbuf_paddr was init'd in pmap */
! 	for (x = 0; x < btoc(MSGBUFSIZE); x++)
! 		pmap_kenter_pa((vaddr_t)msgbuf_vaddr + x * PAGE_SIZE,
! 		    msgbuf_paddr + x * PAGE_SIZE, VM_PROT_READ|VM_PROT_WRITE);
  	pmap_update(pmap_kernel());
  
  	initmsgbuf((caddr_t)msgbuf_vaddr, round_page(MSGBUFSIZE));
--- 290,316 ----
  void
  cpu_startup()
  {
! 	int x, y;
  	vaddr_t minaddr, maxaddr;
+ 	psize_t sz;
  	char pbuf[9];
  
  	/*
  	 * Initialize error message buffer (et end of core).
  	 */
! 	if (msgbuf_p_cnt == 0)
! 		panic("msgbuf paddr map has not been set up");
! 	for (x = 0, sz = 0; x < msgbuf_p_cnt; sz += msgbuf_p_seg[x++].sz);
! 	msgbuf_vaddr = uvm_km_valloc(kernel_map, sz);
  	if (msgbuf_vaddr == 0)
  		panic("failed to valloc msgbuf_vaddr");
  
  	/* msgbuf_paddr was init'd in pmap */
! 	for (y = 0, sz = 0; y < msgbuf_p_cnt; y++) {
! 		for (x = 0; x < btoc(msgbuf_p_seg[y].sz); x++, sz += PAGE_SIZE)
! 			pmap_kenter_pa((vaddr_t)msgbuf_vaddr + sz,
! 			    msgbuf_p_seg[y].paddr + x * PAGE_SIZE, VM_PROT_READ|VM_PROT_WRITE);
! 	}
  	pmap_update(pmap_kernel());
  
  	initmsgbuf((caddr_t)msgbuf_vaddr, round_page(MSGBUFSIZE));
***************
*** 1695,1700 ****
--- 1708,1714 ----
  		psize_t sz = round_page(MSGBUFSIZE);
  		psize_t reqsz = sz;
  
+ 	search_again:
  		for (x = 0; x < vm_nphysseg; x++) {
  			vps = &vm_physmem[x];
  			if (ptoa(vps->avail_end) == avail_end)
***************
*** 1709,1715 ****
  
  		vps->avail_end -= atop(sz);
  		vps->end -= atop(sz);
! 		msgbuf_paddr = ptoa(vps->avail_end);
  
  		/* Remove the last segment if it now has no pages. */
  		if (vps->start == vps->end) {
--- 1723,1730 ----
  
  		vps->avail_end -= atop(sz);
  		vps->end -= atop(sz);
! 		msgbuf_p_seg[msgbuf_p_cnt].sz = sz;
! 		msgbuf_p_seg[msgbuf_p_cnt++].paddr = ptoa(vps->avail_end);
  
  		/* Remove the last segment if it now has no pages. */
  		if (vps->start == vps->end) {
***************
*** 1723,1732 ****
  				avail_end = vm_physmem[x].avail_end;
  		avail_end = ptoa(avail_end);
  
  		/* Warn if the message buffer had to be shrunk. */
- 		if (sz != reqsz)
  			printf("WARNING: %ld bytes not available for msgbuf "
! 			    "in last cluster (%ld used)\n", reqsz, sz);
  	}
  
  	/*
--- 1738,1754 ----
  				avail_end = vm_physmem[x].avail_end;
  		avail_end = ptoa(avail_end);
  
+ 		if (sz != reqsz) {
+ 			reqsz -= sz;
+ 			if (msgbuf_p_cnt != MSGBUF_MAX_SEG) {
+ 		/* if still segments available, get memory from next one ... */
+ 			      sz = reqsz;
+ 			      goto search_again;
+ 			}
  		/* Warn if the message buffer had to be shrunk. */
  			printf("WARNING: %ld bytes not available for msgbuf "
! 			    "in last cluster (%ld used)\n", (long)MSGBUFSIZE, MSGBUFSIZE - reqsz);
! 		}
  	}
  
  	/*
*** pmap.c	2006/08/11 09:40:35	1.1
--- pmap.c	2006/08/11 11:27:01
***************
*** 448,456 ****
  
  caddr_t vmmap; /* XXX: used by mem.c... it should really uvm_map_reserve it */
  
- extern vaddr_t msgbuf_vaddr;
- extern paddr_t msgbuf_paddr;
- 
  extern vaddr_t idt_vaddr;			/* we allocate IDT early */
  extern paddr_t idt_paddr;
  
--- 448,453 ----
***************
*** 1072,1080 ****
  	vmmap = (char *)virtual_avail;			/* don't need pte */
  	virtual_avail += PAGE_SIZE;
  
- 	msgbuf_vaddr = virtual_avail;			/* don't need pte */
- 	virtual_avail += round_page(MSGBUFSIZE);
- 
  	idt_vaddr = virtual_avail;			/* don't need pte */
  	virtual_avail += PAGE_SIZE;
  	idt_paddr = avail_start;			/* steal a page */
--- 1069,1074 ----

>Unformatted: