tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Problem with outstanding knotes and device detach - and a fix



> On Jul 18, 2022, at 2:03 AM, Taylor R Campbell <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
> 
>> Date: Sun, 17 Jul 2022 12:54:56 -0700
>> From: Jason Thorpe <thorpej%me.com@localhost>
>> 
>> Aaaaand another new version.  This:
>> 
>> ==> Creates a knote_impl structure that's private to kern_event.c
>> that has the new lock.  I took the opportunity to move kn_influx to
>> the knote_impl as well, since absolutely no one outside of
>> kern_event.c should touch it.  This change is ABI-compatible.
>> 
>> ==> Improves the comments about the locking rules.
>> 
>> There are no functional changes since the last patch.
> 
> LGTM, thanks for working on this!
> 
> If I understand correctly, the issue is that removing the kqueue
> listener (i.e., filter_detach) and detaching the device notifier
> (i.e., klist_fini, via seldestroy) can race.  A process can close its
> kqueue file descriptor, or EV_DELETE a specific event subscription, at
> the same time as a device it had subscribed to is detaching and
> calling seldestroy.  That is, the lifetimes of:
> 
> (a) a knote in a kqueue listener for any of its event subscriptions, and
> (b) a knote in a driver's klist for any of its subscribers,
> 
> are overlapping but controlled independently, one by the kqueue user
> and the other by the driver, e.g. when the device is physically
> yanked, so the destruction by one path might happen while the other is
> still in use.  filter_detach can't safely call into the driver if the
> knote is being freed by klist_fini; klist_fini can't safely return to
> the driver if filter_detach is still in progress.

It’s actually simpler than that, and doesn’t really involve even much of a race… it’s that any knote associated with a device that has just been yanked may be referring to a device instance’s private data (via kn->kn_hook), and has a kn_fop pointer that points to functions provided by the now-detached device instance.  In the case of a driver that is statically linked into the kernel, at least the function pointers are probably stable and the problem is a simple use-after-free… but in the case of a dynamically-loaded driver module, kn_fop might itself now point at garbage, if the driver is unloaded.

So, if the driver detach happens at any time before the kqueue listener is detached, there is the potential to go BOOM (it’s not guaranteed, because UAF bugs are sometimes like that).

> Serializing the call to .f_detach and the replacement of kn_fop in
> klist_fini with a mutex ensures that one happens after the other for
> each knote.  Using a mutex attached to the knote itself avoids
> problems with the identity of the knote changing (close, EV_DELETE)
> and avoids problems with the identity of the device notifier changing
> (device detach) while either one is trying to run.

More or less… the mutex around the filter ops ensures that kn_fop will point to EITHER the driver-provided filter ops OR the stub do-nothing filter ops, and by holding the mutex across the call into the filter ops, ensures that klist_fini() will not complete while a filter op call is in-progress, thus ensuring that the freeing of the memory that backs the klist being torn down will not complete until there are no callers inside the about-to-be-gone filter ops.

> The kn_fop member is supposed to be stable all use points, so it
> doesn't require additional synchronization like atomic_load/store_*:
> 
> - Before .f_attach returns, the kqueue logic won't touch it, so
>  .f_attach can safely set it without synchronization.
> 
> - By the time the driver calls klist_fini (seldestroy), the driver
>  promises never to KNOTE again so it will never be touched by
>  filter_event.

It also promises to not allow any further f_attach calls to succeed.  (This is not new; the same promise is already made for select, and has been for many years).

> - filter_touch is used only for EVFILT_TIMER and EVFILT_USER, which
>  don't put the knote on any klist so never use klist_fini on it.

Right.

> - filter_detach and klist_fini are serialized by the knote foplock.

Right.

-- thorpej




Home | Main Index | Thread Index | Old Index