netbsd-bugs: Re: kern/32717: alpha 3.0 install kernel doesn't see scsi disks

Subject: Re: kern/32717: alpha 3.0 install kernel doesn't see scsi disks
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Ken Raeburn <raeburn@MIT.EDU>
List: netbsd-bugs
Date: 04/01/2006 02:20:01

The following reply was made to PR kern/32717; it has been noted by GNATS.

From: Ken Raeburn <raeburn@MIT.EDU>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/32717: alpha 3.0 install kernel doesn't see scsi disks
Date: Fri, 31 Mar 2006 21:15:32 -0500

 So, I got another machine (PWS500au, also with SCSI disks) installed  
 under 3.0 (no problems), and started building kernels.  Doing an  
 (approximately) binary search, I found a point where the kernel  
 source on the trunk stops working, showing the error I indicated in  
 my problem report ("request sense for a request sense ?").  With the  
 CVS sources from 9/17/2004 20:45Z, my XP1000 recognizes its disks.   
 With the CVS sources from 21:00Z, it reports an error.

 Only two files changed in this time interval: uvm/uvm_page.c and uvm/ 
 uvm_pglist.c, each changed in one line aside from CVS keywords.   
 Version 1.100 of uvm_page.c and version 1.32 of uvm_pglist.c have  
 this log message:

 date: 2004/09/17 20:46:03;  author: yamt;  state: Exp;  lines: +3 -3
 make free page queue filo rather than fifo.
 data in pages freed more recently are more likely on cpu cache.

 Updating to netbsd-3-0-RELEASE and reverting the change to  
 uvm_page.c, or to both files, gives me an INSTALL kernel that  
 recognizes the disks, and is able to come up to single-user mode once  
 I tell it to use sd0a for the root, and I can find and run ps, ls,  
 and reboot.  Reverting uvm_pglist.c only produces a kernel that shows  
 the failure I first reported.  On the netbsd-3-0 branch (as of about  
 20:10 US/Eastern) I get the same result -- the SCSI controller  
 reports errors using the current CVS version, but if I undo this  
 uvm_page.c change and make it a FIFO queue again, it's happy.

 I have not yet tried building install media with the patch to  
 uvm_page.c.

 The uvm_page.c change itself seems logical.  Assuming it's actually  
 correct, I would guess that my problem means that some page is being  
 put onto the free list, and probably allocated again, while some  
 other part of the kernel (or a DMA device) isn't done with it yet,  
 and the FIFO version of the queue happens to give the extra time  
 needed.  Or maybe it's bad memory and during the boot process one  
 pattern of usage trips over it consistently and the other pattern (as  
 well as running NetBSD 2.0 and doing nightly builds of some code I  
 work on) does not in any noticeable way.  But I think I'm done for  
 tonight....

 Ken