DIAGNOSTIC panic in unp_gc function

To: tech-net%netbsd.org@localhost
Subject: DIAGNOSTIC panic in unp_gc function
From: Ritesh Agrawal <ritesh.agrawal%teznetworks.com@localhost>
Date: Fri, 27 Feb 2015 18:30:50 +0530

Hi All,

I have been seeing random panics on my NetBSD-6.0 based system and thispanic was in AF_LOCAL protocol code (uipc_usrreq.c). One of the panic was:


uvm_fault(0xffffffff81a195e0, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff803042ff cs 8 rflags 10283 cr2  8 cpl 0 rsp
fffffe810f27bab0
panic: trap
cpu1: Begin traceback...
printf_nolog() at netbsd:printf_nolog
startlwp() at netbsd:startlwp
alltraps() at netbsd:alltraps+0x96
unp_detach() at netbsd:unp_detach+0x2e
uipc_usrreq() at netbsd:uipc_usrreq+0x79
soclose() at netbsd:soclose+0x79
soo_close() at netbsd:soo_close+0x1a
closef() at netbsd:closef+0x4a
unp_thread() at netbsd:unp_thread+0x3cb
cpu1: End traceback...

I than installed the NetBSD kernel with "DIAGNOSTIC option" and theDIAGNOSTIC kernel paniced on following line no:


sys/kern/uipc_usrreq.c:1713 with current TOT for MAIN branch.

It seems that we can get into this code with file pointer referencecount as 0.

I got into this situation by following steps:

1) I am passing file between two processes using AF_LOCAL socket.
2) process 1 opens a AF_LOCAL socket with fd "x"

3) now process 1 passes the same fd as part of control message toprocess 2 using this fd.

4) process 1 closes its fd

5) This fd is now only in kernel as part of control message of mbufreceive Q for the AF_LOCAL socket held by process 2

6) Therefore "fp->f_count" of the passed fd is 1
7) Now unp_thread kicks in to process the deferred closed

8) It looks into the filehead list and for UNIX domain socket scans thereceive mbufs9) It marks all the file descriptor which are socket in control messageas "FDEFER"

10) Therefore the file descriptor sent by process 1 is marked as FDEFER

11) Before the "filehead" list is rescanned again (FDEFER fd is ahead inlist), process 2 wakes up and receives the mbuf and closes the FDEFERfile descriptor12) As there was only one reference count on this file descriptor, thefile is put into "file_cache" pool cache when it is closed by process 2.13) This file is freed but it still remains in "filehead" list becausefile is removed from "filehead" list in "file_dtor" function.14) "file_dtor" function is "pc_dtor" function and are calledconditionally.15) Now the "unp_gc" rescans the "filehead" and finds this filedescriptor with "FDEFER" set and file pointer "f_count" as 0

16) It hits KASSERT and system panics

17) If it is not DIAGNOSTIC kernel then it access data which are notvalid and crashes elsewhere.

I saw the current code and this code still exist and looks similar. Ithink we should either increase the f_count of this file pointer whilemarking it FDEFER and then cleanup when we get into this loop. We canalso check for file pointer with "FDEFER" and "f_count == 0" in line1713 and just "continue" as this could be valid case.


Please advice.

Thanks
Ritesh

Follow-Ups:
- Re: DIAGNOSTIC panic in unp_gc function
  - From: Christos Zoulas

Prev by Date: Re: host route out of subnet
Next by Date: Re: DIAGNOSTIC panic in unp_gc function
Previous by Thread: host route out of subnet
Next by Thread: Re: DIAGNOSTIC panic in unp_gc function
Indexes:

Home | Main Index | Thread Index | Old Index