netbsd-bugs: port-sparc/2012: Multiple disks on SS1+ have serious problems

Subject: port-sparc/2012: Multiple disks on SS1+ have serious problems
To: None <gnats-bugs@NetBSD.ORG>
From: Charlie Root <root@strikeforce.vas.viewlogic.com>
List: netbsd-bugs
Date: 02/01/1996 11:34:14
>Number:         2012
>Category:       port-sparc
>Synopsis:       Multiple disks on SS1+ have serious problems
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    gnats-admin (GNATS administrator)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Feb  1 15:05:03 1996
>Last-Modified:
>Originator:     Grey Wolf
>Organization:
	Strike Force Alternatives, Ltd.
>Release:        NetBSD-current 29 Jan 1996
>Environment:
	Machine:	Sun SPARCstation 1+
	OS:		NetBSD 1.1A of 29 Jan 1996
	Target:		sparc (sun4c)
	Disks:		sd1: SUN0424 (Seagate ST1480N)
			sd3: SUN0424 (Seagate ST1480N)

System: NetBSD strikeforce 1.1A NetBSD 1.1A (STRIKEFORCE) #2: Mon Jan 29 17:18:19 PST 1996 root@strikeforce:/usr/src/sys/arch/sparc/compile/STRIKEFORCE sparc

>Description:
	As long as I only access one disk at a time, all is well.
	As soon as I access both disks, I get the following messages
	repeatedly, usually followed by "panic: blkfree: freeing free
	{frag,block}" or "panic: ifree: freeing free inode" or "panic:
	ialloc: allocating already allocated inode" (or something):

	Jan 31 16:15:19 strikeforce /netbsd: RESELECT: 9 bytes in FIFO>esp0: illegal command: 0x12 (state 5, phase 7, prevphase 101)
	Jan 31 16:15:21 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a691c (flags 0x0, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0)
	Jan 31 16:15:21 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a68c4 (flags 0x2, dleft 400), state 3, phase 257, msgpriq 0, msgout 0)
	Jan 31 16:15:22 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a68f0 (flags 0x2, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0)
	Jan 31 16:15:22 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a6898 (flags 0x2, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0)
	Jan 31 16:15:23 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a691c (flags 0x10, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0) AGAIN
	Jan 31 16:15:24 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a68c4 (flags 0x10, dleft 400), state 3, phase 0, msgpriq 0, msgout 0) AGAIN
	Jan 31 16:15:24 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a68f0 (flags 0x10, dleft 2000), state 3, phase 0, msgpriq 0, msgout 0) AGAIN
	Jan 31 16:15:24 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a6898 (flags 0x10, dleft 2000), state 3, phase 0, msgpriq 0, msgout 0) AGAIN
	Jan 31 16:15:24 strikeforce /netbsd: RESELECT: 9 bytes in FIFO>esp0: illegal command: 0x12 (state 5, phase 7, prevphase 101)
	Jan 31 16:15:24 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a68f0 (flags 0x0, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0)
	Jan 31 16:15:24 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a691c (flags 0x2, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0)
	Jan 31 16:15:24 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a6898 (flags 0x2, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0)
	Jan 31 16:15:24 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a68c4 (flags 0x2, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0)
	Jan 31 16:15:24 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a68f0 (flags 0x10, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0) AGAIN
	Jan 31 16:15:24 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a691c (flags 0x10, dleft 2000), state 3, phase 0, msgpriq 0, msgout 0) AGAIN
	Jan 31 16:15:24 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a6898 (flags 0x10, dleft 2000), state 3, phase 0, msgpriq 0, msgout 0) AGAIN
	Jan 31 16:15:24 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a68c4 (flags 0x10, dleft 2000), state 3, phase 0, msgpriq 0, msgout 0) AGAIN

>How-To-Repeat:
	Well, let's see.  'fsck -p' in single-user mode will reproduce this
	just fine.  I had two out of six filesystems fail.  Both disks
	check out fine, i.e. I'm not getting random bad blocks.  I ran
	format/analyze under SunOS (booted from CD-ROM) on BOTH disks.
	They check out fine.  I did a dd from start to finish on both
	disks to /dev/null (dd if=/dev/rsd[13]c of=/dev/null bs=720k
	conv=block) and got 1151+0 records in, 1151+0 records out.
	No errors.

	On sd3 esp(0:3:0) I have:	On sd1 esp(0:1:0) I have:
		a: root				a: /altroot
		b: swap				b: swap (not in use)
		d: /var				h: /usr/X11
		g: /usr
		h: /usr/src
		
	Anytime I do something that does both drives simultaneously
	(sometimes even sync()) will cause this fault.  Data IS LOST!
	The filesystem becomes corrupted and I usually get a panic
	message having to do with allocating or freeing a block, frag
	or inode.
>Fix:
	Please do.  Is there any other information you need, and how
	would I go about getting it?
>Audit-Trail:
>Unformatted: