NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/46402 (LWPs created after exit_lwp() is called can hang the process...)



The following reply was made to PR kern/46402; it has been noted by GNATS.

From: David Holland <dholland-bugs%netbsd.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: 
Subject: Re: kern/46402 (LWPs created after exit_lwp() is called can hang the
 process...)
Date: Tue, 6 Nov 2012 20:10:33 +0000

 two mails not sent to gnats
 
    ------
 
 From: Mindaugas Rasiukevicius <rmind%netbsd.org@localhost>
 To: Greg Oster <oster%cs.usask.ca@localhost>
 Cc: gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, 
oster%netbsd.org@localhost
 Subject: Re: kern/46402 (LWPs created after exit_lwp() is called can hang the
        process...)
 Date: Sat, 22 Sep 2012 14:21:40 +0100
 
 Greg Oster <oster%cs.usask.ca@localhost> wrote:
 >  > Synopsis: LWPs created after exit_lwp() is called can hang the
 >  > process...
 >  > 
 >  > State-Changed-From-To: analyzed->feedback
 >  > State-Changed-By: rmind%NetBSD.org@localhost
 >  > State-Changed-When: Wed, 19 Sep 2012 21:23:06 +0000
 >  > State-Changed-Why:
 >  > Can you please try the following patch?
 >  > 
 >  > http://www.netbsd.org/~rmind/lwp_wait_fix.diff
 >  > 
 >  
 >  After 376883 iterations on the test loop thing hung:
 >  
 >  USER      PID %CPU %MEM    VSZ   RSS TTY     STAT STARTED    TIME
 >  COMMAND UID   PID  PPID   CPU PRI NI    VSZ   RSS WCHAN    STAT
 >  TTY        TIME COMMAND
 >  root    27779  0.0  0.1 798304  1396 ttyp0   DEl+  2:19AM 0:00.01
 >  (t_cond   0 27779   410 93985  63  0 798304  1396 -        DEl+ ttyp0
 >  0:00.01 (t_cond) root    27779  0.0  0.1 798304  1396 ttyp0   DEl+
 >  2:19AM 0:00.01 (t_cond   0 27779   410 93985  95  0 798304  1396
 >  lwpwait  DEl+ ttyp0 0:00.01 (t_cond)
 >  
 >  (sorry for the cut'n'paste mess... weekend plans have changed, and I
 >  have to run.. I can attempt to test more over the next days...)
 
 Thanks.  No hurry. :)  Perhaps you could get the output of ps from crash(8)
 or DDB?  Also, contents of struct proc of that process might be useful.
 
 -- 
 Mindaugas
 
 From: Greg Oster <oster%cs.usask.ca@localhost>
 To: Mindaugas Rasiukevicius <rmind%netbsd.org@localhost>
 Cc: gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
 Subject: Re: kern/46402 (LWPs created after exit_lwp() is called can hang the
        process...)
 Date: Wed, 26 Sep 2012 16:18:50 -0600
 
 On Sat, 22 Sep 2012 14:21:40 +0100
 Mindaugas Rasiukevicius <rmind%netbsd.org@localhost> wrote:
 
 > Greg Oster <oster%cs.usask.ca@localhost> wrote:
 > >  > Synopsis: LWPs created after exit_lwp() is called can hang the
 > >  > process...
 > >  > 
 > >  > State-Changed-From-To: analyzed->feedback
 > >  > State-Changed-By: rmind%NetBSD.org@localhost
 > >  > State-Changed-When: Wed, 19 Sep 2012 21:23:06 +0000
 > >  > State-Changed-Why:
 > >  > Can you please try the following patch?
 > >  > 
 > >  > http://www.netbsd.org/~rmind/lwp_wait_fix.diff
 > >  > 
 > >  
 > >  After 376883 iterations on the test loop thing hung:
 > >  
 > >  USER      PID %CPU %MEM    VSZ   RSS TTY     STAT STARTED    TIME
 > >  COMMAND UID   PID  PPID   CPU PRI NI    VSZ   RSS WCHAN    STAT
 > >  TTY        TIME COMMAND
 > >  root    27779  0.0  0.1 798304  1396 ttyp0   DEl+  2:19AM 0:00.01
 > >  (t_cond   0 27779   410 93985  63  0 798304  1396 -        DEl+
 > > ttyp0 0:00.01 (t_cond) root    27779  0.0  0.1 798304  1396 ttyp0
 > > DEl+ 2:19AM 0:00.01 (t_cond   0 27779   410 93985  95  0 798304
 > > 1396 lwpwait  DEl+ ttyp0 0:00.01 (t_cond)
 > >  
 > >  (sorry for the cut'n'paste mess... weekend plans have changed, and
 > > I have to run.. I can attempt to test more over the next days...)
 > 
 > Thanks.  No hurry. :)  Perhaps you could get the output of ps from
 > crash(8) or DDB?  Also, contents of struct proc of that process might
 > be useful.
 
 I can't explain why, but in another 3 million runs the above was the
 *only* issue encountered.  When I test without the patch, it hangs
 instantly (in the last case, on the first test case).
 
 It's entirely possible that the t_cond hang was caused by something
 unrelated... and I've been unable to replicate the issue since
 rebooting the machine after the last hang.
 
 I'd say "Please commit this patch".
 
 Later...
 
 Greg Oster
 


Home | Main Index | Thread Index | Old Index