Subject: Re: kern/32717: alpha 3.0 install kernel doesn't see scsi disks
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Ken Raeburn <raeburn@MIT.EDU>
List: netbsd-bugs
Date: 04/01/2006 02:20:01
The following reply was made to PR kern/32717; it has been noted by GNATS.
From: Ken Raeburn <raeburn@MIT.EDU>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/32717: alpha 3.0 install kernel doesn't see scsi disks
Date: Fri, 31 Mar 2006 21:15:32 -0500
So, I got another machine (PWS500au, also with SCSI disks) installed
under 3.0 (no problems), and started building kernels. Doing an
(approximately) binary search, I found a point where the kernel
source on the trunk stops working, showing the error I indicated in
my problem report ("request sense for a request sense ?"). With the
CVS sources from 9/17/2004 20:45Z, my XP1000 recognizes its disks.
With the CVS sources from 21:00Z, it reports an error.
Only two files changed in this time interval: uvm/uvm_page.c and uvm/
uvm_pglist.c, each changed in one line aside from CVS keywords.
Version 1.100 of uvm_page.c and version 1.32 of uvm_pglist.c have
this log message:
date: 2004/09/17 20:46:03; author: yamt; state: Exp; lines: +3 -3
make free page queue filo rather than fifo.
data in pages freed more recently are more likely on cpu cache.
Updating to netbsd-3-0-RELEASE and reverting the change to
uvm_page.c, or to both files, gives me an INSTALL kernel that
recognizes the disks, and is able to come up to single-user mode once
I tell it to use sd0a for the root, and I can find and run ps, ls,
and reboot. Reverting uvm_pglist.c only produces a kernel that shows
the failure I first reported. On the netbsd-3-0 branch (as of about
20:10 US/Eastern) I get the same result -- the SCSI controller
reports errors using the current CVS version, but if I undo this
uvm_page.c change and make it a FIFO queue again, it's happy.
I have not yet tried building install media with the patch to
uvm_page.c.
The uvm_page.c change itself seems logical. Assuming it's actually
correct, I would guess that my problem means that some page is being
put onto the free list, and probably allocated again, while some
other part of the kernel (or a DMA device) isn't done with it yet,
and the FIFO version of the queue happens to give the extra time
needed. Or maybe it's bad memory and during the boot process one
pattern of usage trips over it consistently and the other pattern (as
well as running NetBSD 2.0 and doing nightly builds of some code I
work on) does not in any noticeable way. But I think I'm done for
tonight....
Ken