tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: fd code multithreaded race?



On Sat, Jul 31, 2010 at 08:31:19PM +0300, Antti Kantee wrote:
> Hi,
> 
> I'm looking at a KASSERT which is triggering quite rarely for me (in
> terms of iterations):
> 
> panic: kernel diagnostic assertion "dt->dt_ff[i]->ff_refcnt == 0" failed: 
> file 
> "/usr/allsrc/src/sys/rump/librump/rumpkern/../../../kern/kern_descrip.c", 
> line 856
> 
> Upon closer examination, it seems that this can trigger while another
> thread is in fd_getfile() between upping the refcount, testing for
> ff_file, and fd_putfile().  Removing the KASSERT seems to restore correct

You're right there, the KASSERT() is wrong, it should be removed.

> operation, but I didn't read the code far enough to see where the race
> is actually handled and what stops the code from using the wrong file.

FYI the fdfile_t (per-descriptor records) are stable for the lifetime of the
process, what each record descibes can and does of course change, and how
those records are pointed to does change (fdtab_t).
 
There isn't really a concept of "wrong file", as in, the app gets
what it asked for.  It is free to ask for the wrong thing, and it's free
to ask for the right thing at the wrong time, etc - that's its problem.

Unless you're alluding to another bug?

> How-to-repeat:
> Run tests/fs/puffs/t_fuzz mountfuzz7 in a loop.  A multiprocessor kernel
> might produce a more reliable result, so set RUMP_NCPU unless you have
> a multiprocessor host.  Depending on timings and how the get/put thread
> runs, you might even see the refcount as 0 in the core.
> 
> Does anyone see something wrong with the analysis?  If not, I'll create
> a dedidated test and file a PR.


Home | Main Index | Thread Index | Old Index