Subject: Occasional stops / DMA errors on a DS10
To: None <port-alpha@netbsd.org>
From: Torsten Harenberg <harenberg@physik.uni-wuppertal.de>
List: port-alpha
Date: 08/25/2007 19:33:35
Dear all,

I'm using an "old" DS10 running 4.0_BETA2 as a server for the amanda  
tape backup system.

The machine is doing quite a good job. However, from time to time, is  
just stops and kernel seem to hang. Using the serial console, I can  
break into the debugger. And it always shows ioflush as the actual  
process when the machine is in that state.

Rebooting the machine gives a couple of DMA errors. See the log from  
the serial console attached.

Does anybody has an idea where the problem could be?

Thanks for any hint. Best regards and have a nice weekend,

   Torsten


-----

Welcome to minicom 2.1

OPTIONS: History Buffer, F-key Macros, Search History Buffer, I18n
Compiled on Feb 22 2007, 14:19:43.

Press CTRL-A Z for help on special keys

Stopped in pid 8.1 (ioflush) at netbsd:cpu_Debugger+0x4:         
ret     zero,(ra
)
db>
db> cont
Stopped in pid 8.1 (ioflush) at netbsd:cpu_Debugger+0x4:         
ret     zero,(ra
)
db> cont
Stopped in pid 8.1 (ioflush) at netbsd:cpu_Debugger+0x4:         
ret     zero,(ra
)
db> sync
syncing disks... 5 2 done
unmounting file systems...panic: lockmgr: locking against myself
Stopped in pid 8.1 (ioflush) at netbsd:cpu_Debugger+0x4:         
ret     zero,(ra
)
db> cont

dumping to dev 4,1 offset 525039
dump 256 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241  
240 239 23d


rebooting...


halted CPU 0

halt code = 5
HALT instruction executed
PC = fffffc0000300118

CPU 0 booting

Reseting I/O buses...
(boot dqa0.0.0.13.0 -flags A)
block 0 of dqa0.0.0.13.0 is a valid boot block
reading 14 blocks from dqa0.0.0.13.0
bootstrap code read in
base = 200000, image_start = 0, image_bytes = 1c00
initializing HWRPB at 2000
initializing page table at ff40000
initializing machine state
setting affinity to the primary CPU
jumping to bootstrap code

NetBSD/alpha 3.0 FFS Primary Bootstrap
Jumping to entry point...

NetBSD/alpha 3.0 Secondary Bootstrap, Revision 1.13
(riz@faith.netbsd.org, Sun Dec 18 21:28:37 UTC 2005)

VMS PAL rev: 0x1004700010154
OSF PAL rev: 0x1003f0002014f
Switch to OSF PAL code succeeded.

Boot flags: A
8175744+322376 [446448+297368]=0x8d0bd0

Entering netbsd at 0xfffffc00003012e0...
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004,  
2005, 2006
     The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
     The Regents of the University of California.  All rights reserved.

NetBSD 4.0_BETA2 (alice1) #0: Wed Jul  4 21:09:31 CEST 2007
         root@backup.physik.uni-wuppertal.de:/ama_hold/alphaobj/sys/ 
arch/alpha/c1
COMPAQ AlphaServer DS10 617 MHz, s/n AY04204922
8192 byte page size, 1 processor.
total memory = 256 MB
(2824 KB reserved for PROM, 253 MB used by NetBSD)
avail memory = 240 MB
mainbus0 (root)
cpu0 at mainbus0: ID 0 (primary), 21264A-9
cpu0: Architecture extensions: 307<PAT,MVI,CIX,FIX,BWX>
tsc0 at mainbus0: 21272 Core Logic Chipset, Cchip rev 0
tsc0: 2 Dchips, 1 memory bus of 16 bytes
tsc0: arrays present: 256MB, 0MB, 0MB, 0MB, Dchip 0 rev 1
tsp0 at tsc0
pci0 at tsp0 bus 0
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
sio0 at pci0 dev 7 function 0: Acer Labs M1533 PCI-ISA Bridge (rev.  
0xc3)
tlp0 at pci0 dev 9 function 0: DECchip 21143 Ethernet, pass 4.1
tlp0: interrupting at dec 6600 irq 29
tlp0: DEC, Ethernet address 08:00:2b:87:1d:4e
tlp0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
tlp1 at pci0 dev 11 function 0: DECchip 21143 Ethernet, pass 4.1
tlp1: interrupting at dec 6600 irq 30
tlp1: DEC, Ethernet address 08:00:2b:87:18:3f
tlp1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
aceride0 at pci0 dev 13 function 0
aceride0: Acer Labs M5229 UDMA IDE Controller (rev. 0xc1)
aceride0: bus-master DMA support present
aceride0: primary channel wired to compatibility mode
aceride0: primary channel interrupting at isa irq 14
atabus0 at aceride0 channel 0
aceride0: secondary channel wired to compatibility mode
aceride0: secondary channel interrupting at isa irq 15
atabus1 at aceride0 channel 1
isp0 at pci0 dev 15 function 0: QLogic 1020 Fast Wide SCSI HBA
isp0: interrupting at dec 6600 irq 39
scsibus0 at isp0: 16 targets, 8 luns per target
re0 at pci0 dev 17 function 0: RealTek 8169/8110 Gigabit Ethernet  
(rev. 0x10)
re0: interrupting at dec 6600 irq 47
re0: Ethernet address 00:11:f0:00:0a:98
rgephy0 at re0 phy 7: RTL8169S/8110S 1000BASE-T media interface, rev. 2
rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,  
1000baseT-Fo
isa0 at sio0
lpt0 at isa0 port 0x3bc-0x3bf irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
attimer0 at isa0 port 0x40-0x43: AT Timer
pcppi0 at isa0 port 0x61
pcppi0: children must have an explicit unit
midi0 at pcppi0: PC speaker (CPU-intensive output)
spkr0 at pcppi0
isabeep0 at pcppi0
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
mcclock0 at isa0 port 0x70-0x71: mc146818 or compatible
pcppi0: attached to attimer0
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
scsibus0: waiting 2 seconds for devices to settle...
st0 at scsibus0 target 4 lun 0: <QUANTUM, DLT8000, 0119> tape removable
st0: density code 65, variable blocks, write-enabled
wd0 at atabus0 drive 0: <Maxtor 6L160P0>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 152 GB, 317632 cyl, 16 head, 63 sec, 512 bytes/sect x 320173056  
sectors
st0: sync (100.00ns offset 8), 16-bit (20.000MB/s) transfers
wd0: 32-bit data port
st1 at scsibus0 target 5 lun 0: <QUANTUM, DLT8000, 0119> tape removable
st1: wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6  
(Ultra/133)
wd0(aceride0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33)  
(using DMA)
atapibus0 at atabus1: 2 targets
cd0 at atapibus0 drive 1: <Compaq  CRD-8402B, 2000/05/10, 1.03> cdrom  
removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2
wd1 at atabus1 drive 0: <Maxtor 51024H2>
wd1: drive supports 16-sector PIO transfers, LBA addressing
wd1: 9541 MB, 19386 cyl, 16 head, 63 sec, 512 bytes/sect x 19541088  
sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100)
wd1(aceride0:1:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33)  
(using DMA)
cd0(aceride0:1:1): using PIO mode 4, DMA mode 2 (using DMA)
density code 65, variable blocks, write-enabled
st1: sync (100.00ns offset 8), 16-bit (20.000MB/s) transfers
ch0 at scsibus0 target 6 lun 0: <DEC, TL800    (C) DEC, 0524> changer  
removable
ch0: 15 slots, 1 drive, 1 picker, 0 portals
ch0: sync (100.00ns offset 8), 16-bit (20.000MB/s) transfers
ch1 at scsibus0 target 9 lun 0: <DEC, TL800    (C) DEC, 0524> changer  
removable
ch1: 15 slots, 1 drive, 1 picker, 0 portals
ch1: sync (100.00ns offset 8), 16-bit (20.000MB/s) transfers
stray isa irq 14
stray isa irq 15
root on wd0a dumps on wd0b
stray isa irq 14
stray isa irq 14
wd0: transfer error, downgrading to Ultra-DMA mode 1
wd0(aceride0:0:0): using PIO mode 4, Ultra-DMA mode 1 (using DMA)
wd0a: DMA error reading fsbn 128 of 128-143 (wd0 bn 128; cn 0 tn 2 sn  
2), retryg
stray isa irq 14
stray isa irq 14
wd0: soft error (corrected)
root file system type: ffs
WARNING: clock gained 3 days -- CHECK AND RESET THE DATE!
Sat Aug 25 17:22:30 GMT 2007
swapctl: adding /dev/wd0b as swap device at priority 0
Checking for botched superblock upgrades: done.
Starting file system checks:
/dev/rwd0a: 4385 files, 48861 used, 983634 free (458 frags, 122897  
blocks, 0.0%)
/dev/rwd0a: MARKING FILE SYSTEM CLEAN
/dev/rwd0d: INCORRECT BLOCK COUNT I=45885 (8 should be 0) (CORRECTED)
/dev/rwd0d: INCORRECT BLOCK COUNT I=45892 (8 should be 0) (CORRECTED)
/dev/rwd0d: INCORRECT BLOCK COUNT I=45897 (8 should be 0) (CORRECTED)
/dev/rwd0d: INCORRECT BLOCK COUNT I=45898 (12 should be 0) (CORRECTED)
/dev/rwd0d: INCORRECT BLOCK COUNT I=504186 (8 should be 0) (CORRECTED)
/dev/rwd0d: UNREF FILE I=45920  OWNER=backup MODE=100600
/dev/rwd0d: SIZE=3843 MTIME=Aug 23 00:52 2007  (CLEARED)
/dev/rwd0d: UNREF FILE I=45939  OWNER=backup MODE=100600
/dev/rwd0d: SIZE=4236 MTIME=Aug 23 00:53 2007  (CLEARED)
/dev/rwd0d: UNREF FILE I=46004  OWNER=backup MODE=100600
/dev/rwd0d: SIZE=3912 MTIME=Aug 24 02:26 2007  (CLEARED)
/dev/rwd0d: UNREF FILE I=46005  OWNER=backup MODE=100600
/dev/rwd0d: SIZE=3983 MTIME=Aug 24 02:26 2007  (CLEARED)
/dev/rwd0d: LINK COUNT FILE I=504186  OWNER=backup MODE=100600
/dev/rwd0d: SIZE=0 MTIME=Aug 24 02:27 2007  COUNT 2 SHOULD BE 1  
(ADJUSTED)
/dev/rwd0d: FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED)
/dev/rwd0d: SUMMARY INFORMATION BAD (SALVAGED)
/dev/rwd0d: BLK(S) MISSING IN BIT MAPS (SALVAGED)
/dev/rwd0d: 200459 files, 1070803 used, 1042420 free (45724 frags,  
124587 block)
/dev/rwd0d: MARKING FILE SYSTEM CLEAN
/dev/rwd0e: 926 files, 11845 used, 504398 free (118 frags, 63035  
blocks, 0.0% f)
/dev/rwd0e: MARKING FILE SYSTEM CLEAN
aceride0:0:0: lost interrupt
         type: ata tc_bcount: 2048 tc_skip: 0
aceride0:0:0: bus-master DMA error: missing interrupt, status=0x21
wd0f: DMA error reading fsbn 303890752 of 303890752-303890755 (wd0 bn  
319819168g
stray isa irq 14
stray isa irq 14
aceride0:0:0: lost interrupt
         type: ata tc_bcount: 2048 tc_skip: 0
aceride0:0:0: bus-master DMA error: missing interrupt, status=0x21
wd0f: DMA error reading fsbn 303890752 of 303890752-303890755 (wd0 bn  
319819168g
stray isa irq 14
stray isa irq 14
aceride0:0:0: lost interrupt
         type: ata tc_bcount: 2048 tc_skip: 0
aceride0:0:0: bus-master DMA error: missing interrupt, status=0x21
wd0f: DMA error reading fsbn 303890752 of 303890752-303890755 (wd0 bn  
319819168g
stray isa irq 14; stopped logging
aceride0:0:0: lost interrupt
         type: ata tc_bcount: 2048 tc_skip: 0
aceride0:0:0: bus-master DMA error: missing interrupt, status=0x21
wd0: transfer error, downgrading to PIO mode 4
wd0(aceride0:0:0): using PIO mode 4
wd0f: DMA error reading fsbn 303890752 of 303890752-303890755 (wd0 bn  
319819168g
wd0: soft error (corrected)
/dev/rwd0f: LINK COUNT FILE I=2015636  OWNER=backup MODE=100600
/dev/rwd0f: SIZE=144867328 MTIME=Aug 24 02:27 2007  COUNT 2 SHOULD BE  
1 (ADJUST)
/dev/rwd0f: FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED)
/dev/rwd0f: SUMMARY INFORMATION BAD (SALVAGED)
/dev/rwd0f: BLK(S) MISSING IN BIT MAPS (SALVAGED)
/dev/rwd0f: 115 files, 16490842 used, 58390319 free (55 frags,  
7298783 blocks, )
/dev/rwd0f: MARKING FILE SYSTEM CLEAN
Setting tty flags.
Setting sysctl variables:
Starting network.
Hostname: backup.physik.uni-wuppertal.de
IPv6 mode: host
Configuring network interfaces: tlp0.
add net default: gateway 132.195.111.254
Adding interface aliases:
Building databases...
Starting syslogd.
Checking for core dump...
Aug 25 19:25:03 backup /netbsd: stray isa irq 14
Aug 25 19:25:03 backup /netbsd: stray isa irq 15
Aug 25 19:25:03 backup /netbsd: stray isa irq 14
Aug 25 19:25:03 backup last message repeated 7 times
Aug 25 19:25:03 backup /netbsd: stray isa irq 14; stopped logging
savecore: reboot after panic: panic: lockmgr: locking against myself
Aug 25 19:25:03 backup savecore: reboot after panic: panic: lockmgr:  
locking agf
savecore: system went down at Sat Aug 25 19:19:38 2007
savecore: /var/crash/bounds: No such file or directory
savecore: writing compressed core to /var/crash/netbsd.0.core.gz
savecore: writing compressed kernel to /var/crash/netbsd.0.gz
Setting date via ntp.
Mounting all filesystems...
Clearing /tmp.
Checking quotas: done.
Setting securelevel: kern.securelevel: 0 -> 1
Starting virecover.
Starting local daemons:.
Updating motd.
Starting ntpd.
Starting sshd.
postfix/postfix-script: starting the Postfix mail system
Starting inetd.
Starting cron.
Sat Aug 25 19:26:42 CEST 2007

NetBSD/alpha (backup.physik.uni-wuppertal.de) (console)

login:
-----

--
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
<>                                                              <>
<> Torsten Harenberg         harenberg@physik.uni-wuppertal.de  <>
<> Bergische Universitaet                                       <>
<> FB C - Physik             Tel.: +49 (0)202 439-3521          <>
<> Gaussstr. 20              Fax : +49 (0)202 439-2811          <>
<> 42097 Wuppertal                                              <>
<>                                                              <>
<><><><><><><>< Of course it runs NetBSD http://www.netbsd.org ><>