In article <54F06A82.2030109%teznetworks.com@localhost>,
Ritesh Agrawal <ritesh.agrawal%teznetworks.com@localhost> wrote:
Hi All,
I have been seeing random panics on my NetBSD-6.0 based system and this
panic was in AF_LOCAL protocol code (uipc_usrreq.c). One of the panic was:
uvm_fault(0xffffffff81a195e0, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff803042ff cs 8 rflags 10283 cr2 8 cpl 0 rsp
fffffe810f27bab0
panic: trap
cpu1: Begin traceback...
printf_nolog() at netbsd:printf_nolog
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
unp_detach() at netbsd:unp_detach+0x2e
uipc_usrreq() at netbsd:uipc_usrreq+0x79
soclose() at netbsd:soclose+0x79
soo_close() at netbsd:soo_close+0x1a
closef() at netbsd:closef+0x4a
unp_thread() at netbsd:unp_thread+0x3cb
cpu1: End traceback...
I than installed the NetBSD kernel with "DIAGNOSTIC option" and the
DIAGNOSTIC kernel paniced on following line no:
sys/kern/uipc_usrreq.c:1713 with current TOT for MAIN branch.
It seems that we can get into this code with file pointer reference
count as 0.
I got into this situation by following steps:
1) I am passing file between two processes using AF_LOCAL socket.
2) process 1 opens a AF_LOCAL socket with fd "x"
3) now process 1 passes the same fd as part of control message to
process 2 using this fd.
4) process 1 closes its fd
5) This fd is now only in kernel as part of control message of mbuf
receive Q for the AF_LOCAL socket held by process 2
6) Therefore "fp->f_count" of the passed fd is 1
7) Now unp_thread kicks in to process the deferred closed
8) It looks into the filehead list and for UNIX domain socket scans the
receive mbufs
9) It marks all the file descriptor which are socket in control message
as "FDEFER"
10) Therefore the file descriptor sent by process 1 is marked as FDEFER
11) Before the "filehead" list is rescanned again (FDEFER fd is ahead in
list), process 2 wakes up and receives the mbuf and closes the FDEFER
file descriptor
12) As there was only one reference count on this file descriptor, the
file is put into "file_cache" pool cache when it is closed by process 2.
13) This file is freed but it still remains in "filehead" list because
file is removed from "filehead" list in "file_dtor" function.
14) "file_dtor" function is "pc_dtor" function and are called
conditionally.
15) Now the "unp_gc" rescans the "filehead" and finds this file
descriptor with "FDEFER" set and file pointer "f_count" as 0
16) It hits KASSERT and system panics
17) If it is not DIAGNOSTIC kernel then it access data which are not
valid and crashes elsewhere.
I saw the current code and this code still exist and looks similar. I
think we should either increase the f_count of this file pointer while
marking it FDEFER and then cleanup when we get into this loop. We can
also check for file pointer with "FDEFER" and "f_count == 0" in line
1713 and just "continue" as this could be valid case.
Yes, I've seen that too but it is rare. Can you make an example program
that triggers it?