Subject: port-sparc/2012: Multiple disks on SS1+ have serious problems
To: None <gnats-bugs@NetBSD.ORG>
From: Charlie Root <root@strikeforce.vas.viewlogic.com>
List: netbsd-bugs
Date: 02/01/1996 11:34:14
>Number: 2012
>Category: port-sparc
>Synopsis: Multiple disks on SS1+ have serious problems
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: gnats-admin (GNATS administrator)
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Feb 1 15:05:03 1996
>Last-Modified:
>Originator: Grey Wolf
>Organization:
Strike Force Alternatives, Ltd.
>Release: NetBSD-current 29 Jan 1996
>Environment:
Machine: Sun SPARCstation 1+
OS: NetBSD 1.1A of 29 Jan 1996
Target: sparc (sun4c)
Disks: sd1: SUN0424 (Seagate ST1480N)
sd3: SUN0424 (Seagate ST1480N)
System: NetBSD strikeforce 1.1A NetBSD 1.1A (STRIKEFORCE) #2: Mon Jan 29 17:18:19 PST 1996 root@strikeforce:/usr/src/sys/arch/sparc/compile/STRIKEFORCE sparc
>Description:
As long as I only access one disk at a time, all is well.
As soon as I access both disks, I get the following messages
repeatedly, usually followed by "panic: blkfree: freeing free
{frag,block}" or "panic: ifree: freeing free inode" or "panic:
ialloc: allocating already allocated inode" (or something):
Jan 31 16:15:19 strikeforce /netbsd: RESELECT: 9 bytes in FIFO>esp0: illegal command: 0x12 (state 5, phase 7, prevphase 101)
Jan 31 16:15:21 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a691c (flags 0x0, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0)
Jan 31 16:15:21 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a68c4 (flags 0x2, dleft 400), state 3, phase 257, msgpriq 0, msgout 0)
Jan 31 16:15:22 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a68f0 (flags 0x2, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0)
Jan 31 16:15:22 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a6898 (flags 0x2, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0)
Jan 31 16:15:23 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a691c (flags 0x10, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0) AGAIN
Jan 31 16:15:24 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a68c4 (flags 0x10, dleft 400), state 3, phase 0, msgpriq 0, msgout 0) AGAIN
Jan 31 16:15:24 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a68f0 (flags 0x10, dleft 2000), state 3, phase 0, msgpriq 0, msgout 0) AGAIN
Jan 31 16:15:24 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a6898 (flags 0x10, dleft 2000), state 3, phase 0, msgpriq 0, msgout 0) AGAIN
Jan 31 16:15:24 strikeforce /netbsd: RESELECT: 9 bytes in FIFO>esp0: illegal command: 0x12 (state 5, phase 7, prevphase 101)
Jan 31 16:15:24 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a68f0 (flags 0x0, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0)
Jan 31 16:15:24 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a691c (flags 0x2, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0)
Jan 31 16:15:24 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a6898 (flags 0x2, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0)
Jan 31 16:15:24 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a68c4 (flags 0x2, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0)
Jan 31 16:15:24 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a68f0 (flags 0x10, dleft 2000), state 3, phase 257, msgpriq 0, msgout 0) AGAIN
Jan 31 16:15:24 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a691c (flags 0x10, dleft 2000), state 3, phase 0, msgpriq 0, msgout 0) AGAIN
Jan 31 16:15:24 strikeforce /netbsd: sd1(esp0:1:0): timed out (ecb 0xf85a6898 (flags 0x10, dleft 2000), state 3, phase 0, msgpriq 0, msgout 0) AGAIN
Jan 31 16:15:24 strikeforce /netbsd: sd3(esp0:3:0): timed out (ecb 0xf85a68c4 (flags 0x10, dleft 2000), state 3, phase 0, msgpriq 0, msgout 0) AGAIN
>How-To-Repeat:
Well, let's see. 'fsck -p' in single-user mode will reproduce this
just fine. I had two out of six filesystems fail. Both disks
check out fine, i.e. I'm not getting random bad blocks. I ran
format/analyze under SunOS (booted from CD-ROM) on BOTH disks.
They check out fine. I did a dd from start to finish on both
disks to /dev/null (dd if=/dev/rsd[13]c of=/dev/null bs=720k
conv=block) and got 1151+0 records in, 1151+0 records out.
No errors.
On sd3 esp(0:3:0) I have: On sd1 esp(0:1:0) I have:
a: root a: /altroot
b: swap b: swap (not in use)
d: /var h: /usr/X11
g: /usr
h: /usr/src
Anytime I do something that does both drives simultaneously
(sometimes even sync()) will cause this fault. Data IS LOST!
The filesystem becomes corrupted and I usually get a panic
message having to do with allocating or freeing a block, frag
or inode.
>Fix:
Please do. Is there any other information you need, and how
would I go about getting it?
>Audit-Trail:
>Unformatted: