tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

DIAGNOSTIC panic in unp_gc function



Hi All,

I have been seeing random panics on my NetBSD-6.0 based system and this panic was in AF_LOCAL protocol code (uipc_usrreq.c). One of the panic was:

uvm_fault(0xffffffff81a195e0, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff803042ff cs 8 rflags 10283 cr2  8 cpl 0 rsp
fffffe810f27bab0
panic: trap
cpu1: Begin traceback...
printf_nolog() at netbsd:printf_nolog
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
unp_detach() at netbsd:unp_detach+0x2e
uipc_usrreq() at netbsd:uipc_usrreq+0x79
soclose() at netbsd:soclose+0x79
soo_close() at netbsd:soo_close+0x1a
closef() at netbsd:closef+0x4a
unp_thread() at netbsd:unp_thread+0x3cb
cpu1: End traceback...

I than installed the NetBSD kernel with "DIAGNOSTIC option" and the DIAGNOSTIC kernel paniced on following line no:

sys/kern/uipc_usrreq.c:1713 with current TOT for MAIN branch.

It seems that we can get into this code with file pointer reference count as 0.
I got into this situation by following steps:

1) I am passing file between two processes using AF_LOCAL socket.
2) process 1 opens a AF_LOCAL socket with fd "x"
3) now process 1 passes the same fd as part of control message to process 2 using this fd.
4) process 1 closes its fd
5) This fd is now only in kernel as part of control message of mbuf receive Q for the AF_LOCAL socket held by process 2
6) Therefore "fp->f_count" of the passed fd is 1
7) Now unp_thread kicks in to process the deferred closed
8) It looks into the filehead list and for UNIX domain socket scans the receive mbufs 9) It marks all the file descriptor which are socket in control message as "FDEFER"
10) Therefore the file descriptor sent by process 1 is marked as FDEFER
11) Before the "filehead" list is rescanned again (FDEFER fd is ahead in list), process 2 wakes up and receives the mbuf and closes the FDEFER file descriptor 12) As there was only one reference count on this file descriptor, the file is put into "file_cache" pool cache when it is closed by process 2. 13) This file is freed but it still remains in "filehead" list because file is removed from "filehead" list in "file_dtor" function. 14) "file_dtor" function is "pc_dtor" function and are called conditionally. 15) Now the "unp_gc" rescans the "filehead" and finds this file descriptor with "FDEFER" set and file pointer "f_count" as 0
16) It hits KASSERT and system panics
17) If it is not DIAGNOSTIC kernel then it access data which are not valid and crashes elsewhere.

I saw the current code and this code still exist and looks similar. I think we should either increase the f_count of this file pointer while marking it FDEFER and then cleanup when we get into this loop. We can also check for file pointer with "FDEFER" and "f_count == 0" in line 1713 and just "continue" as this could be valid case.

Please advice.

Thanks
Ritesh



Home | Main Index | Thread Index | Old Index