tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

WAPL panic



So, while investigating my WAPL performance problems, It looks like I can 
crash the machine (not reliably, but more often that not) with a simple
        seq 1 3000 | xargs mkdir
command. I get the following backtrace in ddb (wetware OCR):

panic: wapbl_register_deallocation: out of resources
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff8016f01d cs 8 rflags 246 cr2 ffff80011fc2d000 
cpl 0 rsp fffffe811e0fe6f0
Stopped in pid 12551.1 (mkdir) at       netbsd:breakpoint+0x5:  leave
db{3}> bt
breakpoint() at netbs:breakpoint+0x5
vpanic() at netbsd:vpanic+0x1f2
printf_nolog() at netbsd:printf_nolog
wapbl_register_inode() at netbsd:wapo_register_inode
ffs_truncaze() at netbsd:ffs_truncate+0x917
ufs_direnter() at netbsd:ufs_direnter+0x481
ufs_mkdir() at netbsd:ufs_mkdir+0x617
VOP_MKDIR() at netbsd:VOP_MKDIR+0x3b
do_sys_mkdir() at netbsd:do_sys_mkdir+0x10f
syscall() at netbsd:syscall+0xc4

It's unreasonable to take a dump because that would take an estimated four 
to five hours. Is there any reasonable way to get a dump out of a 16G box?

On reboot, at mounting one file system (NOT the one I was operating on as 
the crash happened), the "replaying log to disk" took several minutes.
I physically walked to the server to have a look whether the discs were 
actually busy, and there was a strange pattern: Out of the five discs that 
the RAID was built on, four were blinking at ~7Hz while the fifth was idle.
The position of the idle disc changed on a regular basis (about every two 
seconds), but I could not find a pattern how it moved around. Possibly 
sometimes, two discs were idle at the same time.
Any idea why that took so long? The file system in question is small.


Home | Main Index | Thread Index | Old Index