Re: kern/48212: modunload(8) for nfsserver leaves a dangling callout scheduled

On Sun, 15 Sep 2013, Martin Husemann wrote:

The nfs timer callout should be diesatablished by
nfsserver_modcmd(MODULE_CMD_FINI) ->nfs_fini()->nfs_timer_fini().

Are there hidden other timers in the code, that I overlooked?
Could you verify that above callchain happens on module unload for you?

Further investigation shows that the specific callout is shared between nfs client and server. In the nfs_timer() callout routine itself, there is a check to see if nfs_srvvec is set, and if so, calls it. This check and call is protected by the nfs_timer_lock mutex. And the crash I am seeing is when the nfs_timer() routine tries to grab the mutex!

The actual code in nfsserver_modcmd() that is supposed to handle this is in the call to nfs_timer_srvfini() which also grabs the mutex and then sets nfs_srvvec to NULL.

The crash is 100% reproducible on a 6-core machine, and I have confirmed that nfsserver_modcmd() is definitely being invoked during modunload.

The failing instruction within mutex_vector_enter() (at offset 0x91) is

        movq 0x18(%r15), %rax

This corresponds to line 402 in sys/kern/kern_mutex.c

397             /*
398              * See lwp_dtor() why dereference of the LWP pointer is safe.
399              * We must have kernel preemption disabled for that.
400              */
401             l = (lwp_t *)MUTEX_OWNER(owner);
402             ci = l->l_cpu;

ddb says that r15 contains a value of 0xfffffffffffffff0 (ie, -0x10) so the effective address of the movq instruction (holding a pointer to l) would be 0x8.

