Re: DIAGNOSTIC panic in unp_gc function

To: tech-net%netbsd.org@localhost
Subject: Re: DIAGNOSTIC panic in unp_gc function
From: christos%astron.com@localhost (Christos Zoulas)
Date: Fri, 27 Feb 2015 14:29:11 +0000 (UTC)

In article <54F06A82.2030109%teznetworks.com@localhost>,
Ritesh Agrawal  <ritesh.agrawal%teznetworks.com@localhost> wrote:
>Hi All,
>
>I have been seeing random panics on my NetBSD-6.0 based system and this 
>panic was in AF_LOCAL protocol code (uipc_usrreq.c). One of the panic was:
>
>uvm_fault(0xffffffff81a195e0, 0x0, 1) -> e
>fatal page fault in supervisor mode
>trap type 6 code 0 rip ffffffff803042ff cs 8 rflags 10283 cr2  8 cpl 0 rsp
>fffffe810f27bab0
>panic: trap
>cpu1: Begin traceback...
>printf_nolog() at netbsd:printf_nolog
>startlwp() at netbsd:startlwp
>alltraps() at netbsd:alltraps+0x96
>unp_detach() at netbsd:unp_detach+0x2e
>uipc_usrreq() at netbsd:uipc_usrreq+0x79
>soclose() at netbsd:soclose+0x79
>soo_close() at netbsd:soo_close+0x1a
>closef() at netbsd:closef+0x4a
>unp_thread() at netbsd:unp_thread+0x3cb
>cpu1: End traceback...
>
>I than installed the NetBSD kernel with "DIAGNOSTIC option" and the 
>DIAGNOSTIC kernel paniced on following line no:
>
>sys/kern/uipc_usrreq.c:1713 with current TOT for MAIN branch.
>
>It seems that we can get into this code with file pointer reference 
>count as 0.
>I got into this situation by following steps:
>
>1) I am passing file between two processes using AF_LOCAL socket.
>2) process 1 opens a AF_LOCAL socket with fd "x"
>3) now process 1 passes the same fd as part of control message to 
>process 2 using this fd.
>4) process 1 closes its fd
>5) This fd is now only in kernel as part of control message of mbuf 
>receive Q for the AF_LOCAL socket held by process 2
>6) Therefore "fp->f_count" of the passed fd is 1
>7) Now unp_thread kicks in to process the deferred closed
>8) It looks into the filehead list and for UNIX domain socket scans the 
>receive mbufs
>9) It marks all the file descriptor which are socket in control message 
>as "FDEFER"
>10) Therefore the file descriptor sent by process 1 is marked as FDEFER
>11) Before the "filehead" list is rescanned again (FDEFER fd is ahead in 
>list), process 2 wakes up and receives the mbuf and closes the FDEFER 
>file descriptor
>12) As there was only one reference count on this file descriptor, the 
>file is put into "file_cache" pool cache when it is closed by process 2.
>13) This file is freed but it still remains in "filehead" list because 
>file is removed from "filehead" list in "file_dtor" function.
>14) "file_dtor" function is "pc_dtor" function and are called 
>conditionally.
>15) Now the "unp_gc" rescans the "filehead" and finds this file 
>descriptor with "FDEFER" set and file pointer "f_count" as 0
>16) It hits KASSERT and system panics
>17) If it is not DIAGNOSTIC kernel then it access data which are not 
>valid and crashes elsewhere.
>
>I saw the current code and this code still exist and looks similar. I 
>think we should either increase the f_count of this file pointer while 
>marking it FDEFER and then cleanup when we get into this loop. We can 
>also check for file pointer with "FDEFER" and "f_count == 0" in line 
>1713 and just "continue" as this could be valid case.

Yes, I've seen that too but it is rare. Can you make an example program
that triggers it?

christos

Follow-Ups:
- Re: DIAGNOSTIC panic in unp_gc function
  - From: Ritesh Agrawal

References:
- DIAGNOSTIC panic in unp_gc function
  - From: Ritesh Agrawal

Prev by Date: DIAGNOSTIC panic in unp_gc function
Next by Date: Re: DIAGNOSTIC panic in unp_gc function
Previous by Thread: DIAGNOSTIC panic in unp_gc function
Next by Thread: Re: DIAGNOSTIC panic in unp_gc function
Indexes:

Home | Main Index | Thread Index | Old Index