Source-Changes-D archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: CVS commit: src/sys/kern



> Module Name:    src
> Committed By:   mlelstv
> Date:           Sun Jul 18 06:57:28 UTC 2021
> 
> Modified Files:
>         src/sys/kern: kern_ksyms.c
> 
> Log Message:
> skip symbol tables that were unloaded again to avoid EFAULT when reading
> ksyms.
> 
> also restore TAILQ_FOREACH idiom.

This change isn't quite right: Reading past st = ksyms_last_snapshot
in ksymsread, or in any context where we rely on the snapshot without
holding ksyms_lock, is not allowed -- it will lead to a corrupted view
of the snapshot, and possibly end up reading uninitialized memory.

That's why this code didn't use TAILQ_FOREACH -- it must stop at
ksyms_last_snapshot; if it gets to the end of the list, and witnesses
st = NULL, then that's a bug.

The code also didn't skip entries with st->sd_gone because we arrange
for st->sd_nmap not to be freed until the last ksymsclose, at which
point no more ksymsread can be active.  If you start skipping some
entries, that will cause assertions to fire about mismatched symbol
table size which we computed up front in ksymsopen.

I made a mistake in some previous changes: Although we defer freeing
st->sd_nmap while /dev/ksyms is open, we do not defer freeing
st->sd_strstart or st->sd_symstart.  And it is possible for a
concurrent module _unload_ to issue kobj_unload in the middle of
concurrent ksymsread and free st->sd_strstart or st->sd_symstart while
ksymsread is reading from them.  (My last round of changes was done to
fix bugs found during module _load_.)  The commit message doesn't say,
but I assume your change was meant to fix that bug.

Unfortunately, the change doesn't completely fix it -- we could easily
find ourselves in the following situation:

CPU 0                                   CPU 1
-----                                   -----
read from /dev/ksyms
                                        modunload foo.kmod
load st->sd_gone = false
                                        kobj_unload
                                        store st->sd_gone = true
                                        kobj_free(ko, ko->ko_symtab);
copy from st->sd_symstart -> FAULT

In order to avoid this situation, we either need to:

1. Make ksyms_modunload block until last /dev/ksyms close.

   => Easy enough: slap a condvar on ksyms_opencnt, wait for
      ksyms_opencnt == 0 in ksyms_modunload, notify if ksyms_opencnt
      == 0 in ksymsclose.

   => Holding /dev/ksyms open indefinitely can block module unload
      indefinitely; unclear if this might lead to problematic deadlock
      scenarios.

   => Bonus: Can eliminate some of the logic in ksymsclose, since
      ksyms_modunload will handle it synchronously before return
      anyway.

2. Convert ksymsread to use psref or localcount.

   => Takes a little more effort to convert tailq to pslist -- which
      requires some care because struct ksyms_symtab is shared with
      userland.  So we still have to keep the tailq entry records (or
      add disgusting compat code).  This is why I didn't go for
      pserialize in my last round of changes.

   => Complicated because we would only get a snapshot within a single
      ksymsread call, not from open to close of /dev/ksyms; this would
      require some kind of mechanism to persuade userland tools to
      restart if the snapshot has changed from read to read.

   => (Can't use pserialize because we have to hold the snapshot
      stable across copyout to userland, which may sleep, which is not
      allowed in pserialize read sections.)

3. Find some other way to copy the tables or something while
   /dev/ksyms is open so they won't go away when kobj_unload runs, or
   just keep copies of the strtab and symtab for ksyms that persist
   unload -- might cost a lot of kernel memory, probably not worth
   pursuing.

In any case this change should otherwise be reverted, since it is no
necessary (sd_symstart/sd_strstart will no longer go away during
ksymsread) and it is also wrong (iteration past ksyms_last_snapshot is
not allowed).

I'm leaning toward option (1); other thoughts welcome.


Home | Main Index | Thread Index | Old Index