Subject: kern/23372: mlxctl can panic NetBSD-1.6.1_STABLE/alpha
To: None <gnats-bugs@gnats.netbsd.org>
From: Greg A. Woods <woods@weird.com>
List: netbsd-bugs
Date: 11/04/2003 20:16:03
>Number:         23372
>Category:       kern
>Synopsis:       mlxctl can panic NetBSD-1.6.1_STABLE/alpha
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov 05 01:17:00 UTC 2003
>Closed-Date:
>Last-Modified:
>Originator:     Greg A. Woods
>Release:        NetBSD 1.6.1_STABLE
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Environment:
System: NetBSD proven 1.6.1_STABLE 
Architecture: alpha
Machine: alpha
>Description:

	Something odd happened with the Mylex RAID volumes when I
	rebooted my alpha today (after trying to boot a CD that
	apparently isn't bootable on an alpha).

	While attempting to find out what was happening I tried checking
	the controller status with "mlxctl", but the system paniced:

	I don't believe I've tried running "mlxctl" ever before....

	Note I've used two of the logical volumes (ld1 & ld2)
	extensively, one as /var/obj for a full system build and the
	other to store some bulk data.

>How-To-Repeat:

NetBSD 1.6.2_RC1 (BUILDING) #12: Sat Nov  1 15:54:03 EST 2003
    woods@building:/var/obj/BUILDING
AlphaServer 4000 5/400 4MB, 400MHz, s/n NI64906T1N
8192 byte page size, 2 processors.
total memory = 1536 MB
(2080 KB reserved for PROM, 1533 MB used by NetBSD)
avail memory = 1401 MB
using 9830 buffers containing 78640 KB of memory
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21164A-1
cpu0: VAX FP support, IEEE FP support, Primary Eligible
cpu0: Architecture extensions: 1<BWX>
cpu1 at mainbus0: ID 1, 21164A-2
cpu1: VAX FP support, IEEE FP support
cpu1: processor off-line; multiprocessor support not present in kernel
mcbus0 at mainbus0: 4MB BCache
mcmem0 at mcbus0 mid 1: Memory
mcpcia0 at mcbus0 mid 5: PCI Bridge
mcpcia0: Horse Revision 3, Left Handed Saddle Revision 3, CAP Revision 2
pci0 at mcpcia0 bus 0
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
siop0 at pci0 dev 1 function 0: Symbios Logic 53c810 (fast scsi)
siop0: interrupting at kn300 irq 36
scsibus0 at siop0: 8 targets, 8 luns per target
ppb0 at pci0 dev 2 function 0: Digital Equipment DECchip 21050 PCI-PCI Bridge (rev. 0x02)
pci1 at ppb0 bus 2
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
isp0 at pci1 dev 0 function 0: QLogic 1020 Ultra Wide SCSI HBA
isp0: interrupting at kn300 irq 40
scsibus1 at isp0: 16 targets, 8 luns per target
mlx0 at pci0 dev 3 function 0: Mylex RAID (v2 interface)
mlx0: interrupting at kn300 irq 44
mlx0: DAC960P/PD, 3 channels, firmware 2.49-0-00, 32MB RAM
ld0 at mlx0 unit 0: RAID6, online
ld0: 8182 MB, 4155 cyl, 64 head, 63 sec, 512 bytes/sect x 16756736 sectors
ld1 at mlx0 unit 1: RAID6, offline
ld1: disabled
ld2 at mlx0 unit 2: RAID5, offline
ld2: disabled
mcpcia1 at mcbus0 mid 4: PCI Bridge
mcpcia1: Horse Revision 3, Left Handed Saddle Revision 3, CAP Revision 2
pci2 at mcpcia1 bus 0
pci2: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pceb0 at pci2 dev 1 function 0: Intel 82375EB/SB PCI-EISA Bridge (PCEB) (rev. 0x05)
vga0 at pci2 dev 2 function 0: S3 Trio32/64 (rev. 0x00)
pci_mem_find: void region
pci_mem_find: void region
pci_mem_find: void region
pci_mem_find: void region
pci_mem_find: void region
wsdisplay0 at vga0 (kbdmux ignored)
tlp0 at pci2 dev 3 function 0: DECchip 21140 Ethernet, pass 1.2
tlp0: broken MicroWire interface detected; setting SROM size to 1Kb
tlp0: interrupting at kn300 irq 12
tlp0: DEC DE500-XA, Ethernet address 00:00:f8:1e:38:a7
tlp0: 10baseT, 100baseTX, 100baseTX-FDX, 10baseT-FDX
fpa0 at pci2 dev 4 function 0: DEC DEFPA PCI FDDI SAS Controller
fpa0: FDDI address 08:00:2b:b7:68:e8, FW=3.20, HW=1, SMT V7.2
fpa0: FDDI Port = S (PMD = ANSI Multi-Mode)
fpa0: interrupting at kn300 irq 16
eisa0 at pceb0
isa0 at pceb0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0 (mux ignored)
pms0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pms0 (mux ignored)
lpt0 at isa0 port 0x3bc-0x3bf irq 7
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
spkr0 at pcppi0
isabeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
mcclock0 at isa0 port 0x70-0x71: mc146818 or compatible
stray kn300 irq 40
scsibus0: waiting 2 seconds for devices to settle...
siop0: alloc newcdb at PHY addr 0x887d4000
st0 at scsibus0 target 0 lun 0: <DEC, TLZ09     (C)DEC, 0167> SCSI2 1/sequential removable
st0: drive empty
st0: sync (100.0ns offset 8), 8-bit (10.000MB/s) transfers
cd0 at scsibus0 target 5 lun 0: <DEC, RRD45   (C) DEC, 0436> SCSI2 5/cdrom removable
cd0: async, 8-bit transfers
scsibus1: waiting 2 seconds for devices to settle...
stray kn300 irq 40
sd0 at scsibus1 target 0 lun 0: <DEC, RZ29B    (C) DEC, 0016> SCSI2 0/direct fixed
sd0: 4091 MB, 3708 cyl, 20 head, 113 sec, 512 bytes/sect x 8380080 sectors
stray kn300 irq 40
sd0: sync (100.0ns offset 12), 16-bit (20.000MB/s) transfers, tagged queueing
stray kn300 irq 40
raidattach: Asked for 8 units
Kernel internal RAIDframe activated
RAIDframe: Searching for raid components...
IPsec: Initialized Security Association Processing.
root on sd0a dumps on sd0b
mlx0: unit 1 offline
mountroot: trying nfs...
mountroot: trying msdos...
mountroot: trying cd9660...
mountroot: trying ffs...
readclock: 3/11/5/0/18/23=>1067991503 (1067986000)
root file system type: ffs
init: copying out path `/sbin/init' 11
mlx0: unit 0 online
mlx0: unit 1 offline
Type a quit character (usually ^\) to abort multi-user startup.
Tue Nov  4 19:18:25 EST 2003
swapctl: adding /dev/sd0b as swap device at priority 0
Starting file system checks:
/dev/rsd0a: file system is clean; not checking
/dev/rsd0d: file system is clean; not checking
/dev/rld0a: file system is clean; not checking
Can't open /dev/rld1a: Operation not supported by device
CAN'T CHECK FILE SYSTEM.
/dev/rld1a: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.
Can't open /dev/rld2a: Operation not supported by device
CAN'T CHECK FILE SYSTEM.
/dev/rld2a: UNEXPECTED INCONSISTENCY; RUN fsck_ffs MANUALLY.
/dev/rld0d: file system is clean; not checking
/dev/rld0e: file system is clean; not checking
THE FOLLOWING FILE SYSTEMS HAD AN UNEXPECTED INCONSISTENCY:
        ffs: /dev/rld1a (/build), ffs: /dev/rld2a (/mfbd)
Automatic file system check failed; help!

  N O T I C E :  Please do not use the console except to run shutdown!

We recommend creating a non-root account and using su(1) for root access.
Terminal type is vt100.                                                 
chmod: /tmp: Read-only file system
We recommend creating a non-root account and using su(1) for root access.
[console]<@> # mlx0: unit 0 online
mlx0: unit 1 offline
mlx0: unit 0 online
mlx0: unit 1 offline
mlx0: unit 0 online
mlx0: unit 1 offline
mlx0: unit 0 online
mlx0: unit 1 offline
mlx0: unit 0 online
mlx0: unit 1 offline
mlx0: unit 0 online
mlx0: unit 1 offline
mlx0: unit 0 online
mlx0: unit 1 offline

# mlxctl -a -v cstatus
DAC960P/PD, 3 chmlx_user_command: mlx_ccb_alloc = 35
annels, firmware
 2.49-0-00, 32MBCPU 0: fatal kernel trap:
 RAM

CPU 0    trap entry = 0x2 (memory management fault)
CPU 0    a0         = 0x14
CPU 0    a1         = 0x1
CPU 0    a2         = 0x0
CPU 0    pc         = 0xfffffc0000344958
CPU 0    ra         = 0xfffffc0000344750
CPU 0    pv         = 0xfffffc00004667e0
CPU 0    curproc    = 0xfffffc00088ffcc0
CPU 0        pid = 57, comm = mlxctl

panic: trap
Stopped in pid 57 (mlxctl) at   cpu_Debugger+0x4:       ret     zero,(ra)
db> trace
cpu_Debugger() at cpu_Debugger+0x4
panic() at panic+0x168
trap() at trap+0x5fc
XentMM() at XentMM+0x20
--- memory management fault (from ipl 0) ---
mlx_user_command() at mlx_user_command+0x3d8
mlxioctl() at mlxioctl+0x2bc
spec_ioctl() at spec_ioctl+0x7c
vn_ioctl() at vn_ioctl+0x154
sys_ioctl() at sys_ioctl+0x4ec
syscall_plain() at syscall_plain+0x154
XentSys() at XentSys+0x58
--- syscall (54) ---
--- user mode ---
db> 

	The other day when I booted it up everything seemed fine:

[Sat Nov  1 16:09:28 2003]mlx0 at pci0 dev 3 function 0: Mylex RAID (v2 interface)
[Sat Nov  1 16:09:28 2003]mlx0: interrupting at kn300 irq 44
[Sat Nov  1 16:09:28 2003]mlx0: DAC960P/PD, 3 channels, firmware 2.49-0-00, 32MB RAM
[Sat Nov  1 16:09:28 2003]ld0 at mlx0 unit 0: RAID6, online
[Sat Nov  1 16:09:28 2003]ld0: 8182 MB, 4155 cyl, 64 head, 63 sec, 512 bytes/sect x 16756736 sectors
[Sat Nov  1 16:09:28 2003]ld1 at mlx0 unit 1: RAID6, online
[Sat Nov  1 16:09:28 2003]ld1: 8182 MB, 4155 cyl, 64 head, 63 sec, 512 bytes/sect x 16756736 sectors
[Sat Nov  1 16:09:28 2003]ld2 at mlx0 unit 2: RAID5, online
[Sat Nov  1 16:09:28 2003]ld2: 28637 MB, 7272 cyl, 128 head, 63 sec, 512 bytes/sect x 58648576 sectors

>Fix:

	unknown

>Release-Note:
>Audit-Trail:
>Unformatted: