Subject: ccd changed in current?
To: None <current-users@NetBSD.ORG>
From: Michael L. VanLoon -- HeadCandy.com <michaelv@MindBender.serv.net>
List: current-users
Date: 08/28/1997 16:35:37
I have a NetBSD system, built from sources between 1.2 and 1.2.1 (it's
a stock 1.2 system with some of the 1.2.1 patches).  It's been running
great.  90+ days uptime at one point.

However, I've been wanting to upgrade it to current for awhile now.  I
just started doing that today.  I tried to the kernel from the i386
8/15 snapshot on warped.com.

The machine hangs whenever I try to fsck my /u filesystem, which is a
ccd spread over four SCSI drives.  To further complicate matters, the
drives are spread over three SCSI controllers (a 2940UW and a dual-
channel 3940UW).  Like I said, the ccd on these four drives and these
three controllers has been running with 90+ days uptime -- i.e. no
problems with the hardware.

It looks like this:

	ahc0 targ0 - / /var
	ahc0 targ5 - /altroot1 ccd0:/u
	ahc1 targ6 - /altroot2 ccd0:/u
	ahc2 targ5 - /altroot3 ccd0:/u
	ahc2 targ6 - /altroot4 ccd0:/u

(there's also a jaz drive, a CD-ROM drive and a DAT drive in there,
 but they aren't involved, as far as I can tell)

The current kernel boots fine into single-user mode.  I can fsck each
of my disks individually, including small altroots on each of the four
disks where the big ccd lives.  I then do a "ccdconfig -gv", which
appears to work correctly.

This is where the problem happens -- if I try to read (or write, I
suppose, but reading is enough) the ccd filesystem, the machine simply
hangs, without spitting out any diagnostics.  The two cases
specifically are, once I tried to run "fsck -nf /dev/rccd0f", and it
hung.  The second time I simply tried to mount it read-only after
coming up in single-user mode, which worked, but hung immediately
after, when I tried "ls /u".

So, I'm wondering if ccd changed somehow between 1.2-ish and current
as of this month, so that it might be allocating the filesystem
structures differently from the interleaved disks, and getting
confused?  Any other ideas?

Other possibilities might be if the ahc driver got messed up so it
doesn't reliably work with multiple controllers.  Or somehow the
PCI-PCI bridge, on the 3940UW, is getting in the way (even though it
isn't on 1.2).

I have rebooted the machine with my 1.2+ kernel, and it once again is
running reliably on the same hardware.

-----------------------------------------------------------------------------
  Michael L. VanLoon                           michaelv@MindBender.serv.net
      Contract software development for Windows NT, Windows 95 and Unix.
             Windows NT and Unix server development in C++ and C.

        --<  Free your mind and your machine -- NetBSD free un*x  >--
    NetBSD working ports: 386+PC, Mac 68k, Amiga, Atari 68k, HP300, Sun3,
        Sun4/4c/4m, DEC MIPS, DEC Alpha, PC532, VAX, MVME68k, arm32...
    NetBSD ports in progress: PICA, others...
-----------------------------------------------------------------------------