Subject: kern/10001: LFS breaks root on md0a
To: None <gnats-bugs@gnats.netbsd.org>
From: John Hawkinson <jhawk@mit.edu>
List: netbsd-bugs
Date: 04/27/2000 18:39:16
>Number:         10001
>Category:       kern
>Synopsis:       LFS breaks root on md0a
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Apr 27 18:40:00 PDT 2000
>Closed-Date:
>Last-Modified:
>Originator:     John Hawkinson
>Release:        1.4.2 and also -current snapshot from 2/26/2000
>Organization:
	MIT
>Environment:
	
System: NetBSD zorkmid.mit.edu 1.4.2 NetBSD 1.4.2 (ZORKMID) #101: Wed Apr 19 22:29:43 EDT 2000 jhawk@zorkmid.mit.edu:/usr/src/sys/arch/i386/compile/ZORKMID i386


>Description:
	lfs_mountroot() seems to cause my kernel to hang in
tsleep() (called from getblk()) if root is on an md0a (memory
disk).

    Since all configured filesystem types are attempted in order and lfs
is tried before ffs, you see this with an ffs md.

>How-To-Repeat:
	

	Add the install-specific spooge to GENERIC:


options         MEMORY_DISK_HOOKS
options         MEMORY_DISK_IS_ROOT     # force root on memory disk
options         MEMORY_DISK_SERVER=0    # no userspace memory disk support
options         MINIROOTSIZE=3174       # size of memory disk, in blocks

	Build GENERIC. cd /usr/src/distrib/i386/floppies/bootfloppy-big
	make KERN=/sys/arch/i386/compile/GENERIC/netbsd netbsd.ram.gz

	boot it.

We hang after printing 'root on md0a' at:

db> t
_Debugger(c085fec0,c065bce4,c065bce4,c024903c,7fffffff) at _Debugger+0x4
_comintr(c0836900) at _comintr+0xb2
_Xintr4() at _Xintr4+0x70
--- interrupt ---
_idle(0,c12ae5b0,0,0,c06cde94) at _idle+0x12
bpendtsleep(c12ae5b0,11,c01a8cff,0,c06cdf18) at bpendtsleep
_getblk(c4d8b140,10,2000,0,0) at _getblk+0x92
_bread(c4d8b140,10,2000,c085a880,c06cdf18) at _bread+0x2d
_lfs_mountfs(c4d8b140,c088c000,c065bce4,c03b1bb0,c065bce4) at _lfs_mountfs+0x19b

_lfs_mountroot(ffffffff,c06cdfa8,c0183f76,c06cb010,6cb000) at _lfs_mountroot+0x6
7
_vfs_mountroot(c06cb010,6cb000,6d2000,90,800007ff) at _vfs_mountroot+0x9d
_main(0,0,0,0,0) at _main+0x362

which is this segment of code from getblk():

                s = splbio();
                if (ISSET(bp->b_flags, B_BUSY)) {
                        SET(bp->b_flags, B_WANTED);
                        err = tsleep(bp, slpflag | (PRIBIO + 1), "getblk",
                            slptimeo);
                        splx(s);

Some theorized this is a race being tickled by the md, or perhaps
a missing brelse() somewhere.


>Fix:
	WORKAROUNDs:

		a) boot -r and specify 'ffs' instead of 'generic'
		b) deconfigure LFS from the kernel [untested]
>Release-Note:
>Audit-Trail:
>Unformatted: