Subject: kern/28594: 2.0_RC5 lock-up/loop in checkaliases()
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <andreas@planix.com>
List: netbsd-bugs
Date: 12/09/2004 14:30:00
>Number:         28594
>Category:       kern
>Synopsis:       2.0_RC5 lock-up/lookp in checkaliases()
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Dec 09 14:30:00 +0000 2004
>Originator:     Andreas Wrede <andreas@planix.com>
>Release:        NetBSD 2.0_RC5
>Organization:
Planix, Inc.
>Environment:
	
	
System: NetBSD whome.planix.com 2.0_RC5 NetBSD 2.0_RC5 (PLANIX) #11: Tue Nov 23 10:49:49 EST 2004 root@willy.wrede.pvt:/u1/netbsd-2.0/obj/sys/arch/i386/compile.i386/PLANIX i386
Architecture: i386
Machine: i386
>Description:
	Every second or third night, during one of the find's in 
/etc/{daily|security}, the NetBSD 2.0/i3896 server locks up. Keystrokes no
longer echo on the serial console. Entering the debugger
usually works. When trying to "reboot" , I get a "panic: lockmgr: locking 
against myself". At the time of the lock-up, one of the XServe RAID based
1TByte file system was mounted:

df -h
Filesystem    Size     Used     Avail Capacity  Mounted on
/dev/raid1a   1.4G     1.1G      232M    83%    /
/dev/raid1e   2.0G     1.4G      479M    75%    /var
/dev/raid1f   3.9G     2.4M      3.7G     0%    /u1
/dev/sd0a     1.0T     326G      632G    34%    /u5
kernfs        1.0K     1.0K        0B   100%    /kern
procfs        4.0K     4.0K        0B   100%    /proc

/u5 is a ffsv1 filesystem 32 blocks short of the 1Tb mark. The same lock-up
occurs when the /u5 filesystem is 1Tb+ ffv2.

Stopped in pid 25005.1 (find) at        netbsd:cpu_Debugger+0x4:        leave
db> bt
cpu_Debugger(cc0e4b8c,c037ddb0,cc0e4b74,7ff,c1557000) at netbsd:cpu_Debugger+0x4  
comintr(c12b8200,0,cb7d0010,30,cc0e0010) at netbsd:comintr+0x6b9  
Xintr_legacy4() at netbsd:Xintr_legacy4+0xa4   
--- interrupt ---  
checkalias(cfdf8688,120c,c12e6000,cfdfa160,c1908000) at netbsd:checkalias+0x5e   
ufs_vinit(c12e6000,c128c300,c128c200,cc0e4ca8,c23528c0) at netbsd:ufs_vinit+0x69  
ffs_vget(c12e6000,3978196,cc0e4d64,d595eb70,cc0e4cf8) at netbsd:ffs_vget+0x274  
ufs_lookup(cc0e4d94,cfdf8540,cc0e4dac,c037d409,c05730a0) at netbsd:ufs_lookup+0x6d4 
VOP_LOOKUP(cf3ed444,cc0e4e84,cc0e4e98,cc0e4e84,c0573820) at netbsd:VOP_LOOKUP+0x2e 
lookup(cc0e4e74,cbfa6c00,400,cc0e4e8c,cc0e4e24) at netbsd:lookup+0x201  
namei(cc0e4e74,8081448,60,0,8081540) at netbsd:namei+0x138  
sys___lstat13(cd02e2ac,cc0e4f64,cc0e4f5c,0,c153f000) at netbsd:sys___lstat13+0x58 
syscall_plain() at netbsd:syscall_plain+0x7e   
--- syscall (number 280) ---  
0x480e7357:  
db> reboot  
syncing disks... panic: lockmgr: locking against myself  
Begin traceback...   
lockmgr(cf3ed4b4,10012,cf3ed444,1,0) at netbsd:lockmgr+0xcb  
genfs_lock(cc0e4790,0,c02bc0fc,10001,c05737e0) at netbsd:genfs_lock+0x19   
VOP_LOCK(cf3ed444,10012,c078eb40,c0705440,cf3ed444) at netbsd:VOP_LOCK+0x28  
vn_lock(cf3ed444,10012,0,c12e6000,cf3ed444) at netbsd:vn_lock+0x69  
vget(cf3ed444,10012,c06fb9a4,cd02e2ac,0) at netbsd:vget+0x9b  
ffs_sync(c12e6000,2,c12fc400,ccd76010,c0100f6e) at netbsd:ffs_sync+0x1b6  
sys_sync(cd02e2ac,0,0,0,0) at netbsd:sys_sync+0xca  
vfs_shutdown(0,0,cc0e4898,c0310c45,0) at netbsd:vfs_shutdown+0x5f  
cpu_reboot(0,0,cc0e4898,c0310c1b,cc0e48b8) at netbsd:cpu_reboot+0x15f  
db_reboot_cmd(1,0,c025c01d,cc0e48c0,d) at netbsd:db_reboot_cmd+0x4d   
db_command(c06f8764,c05707e0,cc0e49b8,c031046d,c065ad90) at netbsd:db_command+0xe3   
db_command_loop(c03d5940,61ad,1,ccd76169,0) at netbsd:db_command_loop+0x8e  
db_trap(1,0,c12d2e88,cc0e4a1c,c0127986) at netbsd:db_trap+0xd4  
kdb_trap(1,0,cc0e4b14,cc0e1000,caeb90ac) at netbsd:kdb_trap+0x132  
trap() at netbsd:trap+0xd0  
--- trap (number 1) ---  
cpu_Debugger(cc0e4b8c,c037ddb0,cc0e4b74,7ff,c1557000) at netbsd:cpu_Debugger+0x4  
comintr(c12b8200,0,cb7d0010,30,cc0e0010) at netbsd:comintr+0x6b9  
Xintr_legacy4() at netbsd:Xintr_legacy4+0xa4   
--- interrupt ---  
checkalias(cfdf8688,120c,c12e6000,cfdfa160,c1908000) at netbsd:checkalias+0x5e   
ufs_vinit(c12e6000,c128c300,c128c200,cc0e4ca8,c23528c0) at netbsd:ufs_vinit+0x69  
ffs_vget(c12e6000,3978196,cc0e4d64,d595eb70,cc0e4cf8) at netbsd:ffs_vget+0x274  
ufs_lookup(cc0e4d94,cfdf8540,cc0e4dac,c037d409,c05730a0) at netbsd:ufs_lookup+0x6d4  
VOP_LOOKUP(cf3ed444,cc0e4e84,cc0e4e98,cc0e4e84,c0573820) at netbsd:VOP_LOOKUP+0x2e   
lookup(cc0e4e74,cbfa6c00,400,cc0e4e8c,cc0e4e24) at netbsd:lookup+0x201  
namei(cc0e4e74,8081448,60,0,8081540) at netbsd:namei+0x138  
sys___lstat13(cd02e2ac,cc0e4f64,cc0e4f5c,0,c153f000) at netbsd:sys___lstat13+0x58   
syscall_plain() at netbsd:syscall_plain+0x7e   
--- syscall (number 280) ---  
0x480e7357:  
End traceback...  

repeatedly continuing execution and breaking into the debugger sometimes
finds us in routines "below" checkaliases:
:
lockmgr(cb72dd40,10012,cb72dcd0,c0705440,cc0e3b84) at netbsd:lockmgr+0x806
genfs_lock(cc0e3b74,cc0e3b98,c03d38f7,c12b8200,c05737e0) at netbsd:genfs_lock+0x19
VOP_LOCK(cb72dcd0,10012,c0102c91,2,cb72dcd0) at netbsd:VOP_LOCK+0x28
vn_lock(cb72dcd0,10012,cc0e3bfc,d15d9104,cb72dcd0) at netbsd:vn_lock+0x69
vget(cb72dcd0,10012,8,282,cb767294) at netbsd:vget+0x9b
checkalias(ced12d98,120c,c12e5000,cff04f28,c18cd800) at netbsd:checkalias+0x119

A 201Mb core file is available.

Console log:

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.
[Tue Nov 30 06:44:49 2004]
NetBSD 2.0_RC5 (PLANIX) #11: Tue Nov 23 10:49:49 EST 2004
    root@willy.wrede.pvt:/u1/netbsd-2.0/obj/sys/arch/i386/compile.i386/PLANIX
total memory = 639 MB
avail memory = 617 MB
BIOS32 rev. 0 found at 0xfd7d2
mainbus0 (root)
cpu0 at mainbus0: (uniprocessor)
cpu0: Intel Pentium III (686-class), 920.67 MHz, id 0x683
cpu0: features 387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 387fbff<PGE,MCA,CMOV,PAT,PSE36,PN,MMX>
cpu0: features 387fbff<FXSR,SSE>
cpu0: I-cache 16 KB 32B/line 4-way, D-cache 16 KB 32B/line 4-way
cpu0: L2 cache 256 KB 32B/line 8-way
cpu0: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative
cpu0: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way
cpu0: serial number 0000-0683-0003-B1AD-7648-C2F3
cpu0: 8 page colors
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: ServerWorks CNB20-HE PCI/AGP bridge (rev. 0x05)
pchb1 at pci0 dev 0 function 1
pchb1: ServerWorks CNB20-HE PCI/AGP bridge (rev. 0x05)
pci1 at pchb1 bus 3
pci1: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
isp0 at pci1 dev 4 function 0: QLogic FC-AL and Fabric HBA
isp0: interrupting at irq 11
scsibus0 at isp0: 256 targets, 8 luns per target
isp1 at pci1 dev 5 function 0: QLogic FC-AL and Fabric HBA
isp1: interrupting at irq 11
scsibus1 at isp1: 256 targets, 8 luns per target
esiop0 at pci1 dev 6 function 0: Symbios Logic 53c896 (ultra2-wide scsi)
esiop0: using on-board RAM
esiop0: interrupting at irq 5
esiop0: alloc new tag DSA table at PHY addr 0x2184000
scsibus2 at esiop0: 16 targets, 8 luns per target
esiop1 at pci1 dev 6 function 1: Symbios Logic 53c896 (ultra2-wide scsi)
esiop1: using on-board RAM
esiop1: interrupting at irq 5
esiop1: alloc new tag DSA table at PHY addr 0x2185000
scsibus3 at esiop1: 16 targets, 8 luns per target
fxp0 at pci0 dev 4 function 0: i82559 Ethernet, rev 8
fxp0: interrupting at irq 9
fxp0: Ethernet address 00:10:83:fd:80:7b
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vga1 at pci0 dev 5 function 0: ATI Technologies 3D Rage IIC (rev. 0x7a)
wsdisplay0 at vga1 kbdmux 1
wsmux1: connecting to wsdisplay0
pcib0 at pci0 dev 15 function 0
pcib0: ServerWorks OSB4 ISA bridge (rev. 0x4f)
rccide0 at pci0 dev 15 function 1
rccide0: ServerWorks OSB4 IDE Controller (rev. 0x00)
rccide0: bus-master DMA support present
rccide0: primary channel configured to compatibility mode
rccide0: primary channel interrupting at irq 14
atabus0 at rccide0 channel 0
rccide0: secondary channel configured to compatibility mode
rccide0: secondary channel interrupting at irq 15
atabus1 at rccide0 channel 1
isa0 at pcib0
lpt0 at isa0 port 0x378-0x37b irq 7
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0 mux 1
wskbd0: connecting to wsdisplay0
pmsprobe: reset error 5
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff: using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
isapnp0: no ISA Plug 'n Play devices found
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
raidattach: Asked for 8 units
Kernelized RAIDframe activated
IPsec: Initialized Security Association Processing.
scsibus0: waiting 2 seconds for devices to settle...
scsibus1: waiting 2 seconds for devices to settle...
scsibus2: waiting 2 seconds for devices to settle...
scsibus3: waiting 2 seconds for devices to settle...
atapibus0 at atabus0: 2 targets
cd0 at atapibus0 drive 0: <HITACHI CDR-8435, , 0010> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2
cd0(rccide0:0:0): using PIO mode 4, DMA mode 2 (using DMA data transfers)
sd0 at scsibus0 target 0 lun 0: <APPLE, Xserve RAID, 1.21> disk fixed
sd0: 1035 GB, 132522 cyl, 128 head, 128 sec, 512 bytes/sect x 2171240448 sectors
sd1 at scsibus1 target 0 lun 0: <APPLE, Xserve RAID, 1.21> disk fixed
sd1: 1035 GB, 132522 cyl, 128 head, 128 sec, 512 bytes/sect x 2171240448 sectors
esiop0: alloc newcdb at PHY addr 0x2247000
sd2 at scsibus2 target 0 lun 0: <COMPAQ, AB01831AC4, 3A05> disk fixed
sd2: 17366 MB, 7001 cyl, 20 head, 254 sec, 512 bytes/sect x 35566000 sectors
sd2: sync (50.00ns offset 15), 16-bit (40.000MB/s) transfers, tagged queueing
sd3 at scsibus2 target 1 lun 0: <COMPAQ, AB01831AC4, 3A05> disk fixed
sd3: 17366 MB, 7001 cyl, 20 head, 254 sec, 512 bytes/sect x 35566000 sectors
sd3: sync (50.00ns offset 15), 16-bit (40.000MB/s) transfers, tagged queueing
sd4 at scsibus2 target 2 lun 0: <COMPAQ, AB0183346B, 3A05> disk fixed
sd4: 17366 MB, 7001 cyl, 20 head, 254 sec, 512 bytes/sect x 35566000 sectors
sd4: sync (50.00ns offset 31), 16-bit (40.000MB/s) transfers, tagged queueing
sd5 at scsibus2 target 3 lun 0: <COMPAQ, HB01831B95, A195> disk fixed
sd5: 17366 MB, 7001 cyl, 20 head, 254 sec, 512 bytes/sect x 35566000 sectors
esiop0: alloc new tag DSA table at PHY addr 0x2256000
sd5: sync (50.00ns offset 31), 16-bit (40.000MB/s) transfers, tagged queueing
ses0 at scsibus2 target 5 lun 0: <HP, SAFTE; U160/M BP, 1021> processor fixed
ses0: SAF-TE Compliant Device
ses0: async, 8-bit transfers
sd6 at scsibus2 target 8 lun 0: <IBM, DNES-318350Y, SAH0> disk fixed
sd6: 17501 MB, 11474 cyl, 10 head, 312 sec, 512 bytes/sect x 35843670 sectors
sd6: sync (50.00ns offset 31), 16-bit (40.000MB/s) transfers, tagged queueing
esiop1: alloc newcdb at PHY addr 0x225d000
sd7 at scsibus3 target 0 lun 0: <HP, 9.10GB A 80-DC1B, DC1B> disk fixed
sd7: 8678 MB, 8387 cyl, 10 head, 211 sec, 512 bytes/sect x 17773524 sectors
sd7: sync (50.00ns offset 15), 16-bit (40.000MB/s) transfers, tagged queueing
sd8 at scsibus3 target 1 lun 0: <HP, 9.10GB A 80-DC1B, DC1B> disk fixed
sd8: 8678 MB, 8387 cyl, 10 head, 211 sec, 512 bytes/sect x 17773524 sectors
sd8: sync (50.00ns offset 15), 16-bit (40.000MB/s) transfers, tagged queueing
Searching for RAID components...
Component on: sd2a: 35565937
   Row: 0 Column: 0 Num Rows: 1 Num Columns: 5
   Version: 2 Serial Number: 2004060711 Mod Counter: 414
   Clean: Yes Status: 0
   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
   RAID Level: 5  blocksize: 512 numBlocks: 35565856
   Autoconfig: Yes
   Contains root partition: No
   Last configured as: raid0
Component on: sd3a: 35565937
   Row: 0 Column: 1 Num Rows: 1 Num Columns: 5
   Version: 2 Serial Number: 2004060711 Mod Counter: 414
   Clean: Yes Status: 0
   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
   RAID Level: 5  blocksize: 512 numBlocks: 35565856
   Autoconfig: Yes
   Contains root partition: No
   Last configured as: raid0
Component on: sd4a: 35565937
   Row: 0 Column: 2 Num Rows: 1 Num Columns: 5
   Version: 2 Serial Number: 2004060711 Mod Counter: 414
   Clean: Yes Status: 0
   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
   RAID Level: 5  blocksize: 512 numBlocks: 35565856
   Autoconfig: Yes
   Contains root partition: No
   Last configured as: raid0
Component on: sd5a: 35565937
   Row: 0 Column: 3 Num Rows: 1 Num Columns: 5
   Version: 2 Serial Number: 2004060711 Mod Counter: 414
   Clean: Yes Status: 0
   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
   RAID Level: 5  blocksize: 512 numBlocks: 35565856
   Autoconfig: Yes
   Contains root partition: No
   Last configured as: raid0
Component on: sd6a: 35565937
   Row: 0 Column: 4 Num Rows: 1 Num Columns: 5
   Version: 2 Serial Number: 2004060711 Mod Counter: 414
   Clean: Yes Status: 0
   sectPerSU: 32 SUsPerPU: 1 SUsPerRU: 1
   RAID Level: 5  blocksize: 512 numBlocks: 35565856
   Autoconfig: Yes
   Contains root partition: No
   Last configured as: raid0
Component on: sd7a: 17773461
   Row: 0 Column: 0 Num Rows: 1 Num Columns: 2
   Version: 2 Serial Number: 2004060712 Mod Counter: 841
   Clean: No Status: 0
   sectPerSU: 64 SUsPerPU: 1 SUsPerRU: 1
   RAID Level: 1  blocksize: 512 numBlocks: 17773376
   Autoconfig: Yes
   Contains root partition: Yes
   Last configured as: raid1
Found: sd2a at 0
Found: sd3a at 1
Found: sd4a at 2
Found: sd5a at 3
Found: sd6a at 4
RAID autoconfigure
Configuring raid0:
Starting autoconfiguration of RAID set...
Looking for 0 in autoconfig
Found: sd2a at 0
Looking for 1 in autoconfig
Found: sd3a at 1
Looking for 2 in autoconfig
Found: sd4a at 2
Looking for 3 in autoconfig
Found: sd5a at 3
Looking for 4 in autoconfig
Found: sd6a at 4
raid0: allocating 50 buffers of 16384 bytes.
raid0: RAID Level 5
raid0: Components: /dev/sd2a /dev/sd3a /dev/sd4a /dev/sd5a /dev/sd6a
raid0: Total Sectors: 142263424 (69464 MB)
Found: sd7a at 0
RAID autoconfigure
Configuring raid1:
Starting autoconfiguration of RAID set...
Looking for 0 in autoconfig
Found: sd7a at 0
Looking for 1 in autoconfig
raid1: allocating 20 buffers of 32768 bytes.
raid1: RAID Level 1
raid1: Components: /dev/sd7a component1[**FAILED**]
raid1: Total Sectors: 17773376 (8678 MB)
boot device: raid1
root on raid1a dumps on raid1b
mountroot: trying msdos...
mountroot: trying cd9660...
mountroot: trying nfs...
mountroot: trying lfs...
mountroot: trying ext2fs...
mountroot: trying ffs...
root file system type: ffs
init: copying out path `/sbin/init' 11

>How-To-Repeat:
	update a NetBSD 2.0/i386 server fom 60Gb RAIDframe to 2x1000Gb Xerve
RAID. Wait a couple of days.

>Fix:
Unknown

>Unformatted: