Source-Changes archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

CVS commit: [netbsd-10] src/sys



Module Name:    src
Committed By:   martin
Date:           Tue Aug  1 14:49:06 UTC 2023

Modified Files:
        src/sys/dev/dkwedge [netbsd-10]: dk.c
        src/sys/kern [netbsd-10]: subr_disk.c
        src/sys/sys [netbsd-10]: disk.h

Log Message:
Pull up following revision(s) (requested by riastradh in ticket #284):

        sys/dev/dkwedge/dk.c                            1.125-1.158
        sys/kern/subr_disk.c                            1.135-1.137
        sys/sys/disk.h                                  1.78

dk(4): Explain why dk_rawopens can't overflow and assert it.

dk(4): Restore assertions in dklastclose.

We only enter dklastclose if the wedge is open (sc->sc_dk.dk_openmask
!= 0), which can happen only if dkfirstopen has succeeded, in which
case we hold a dk_rawopens reference to the parent that prevents
anyone else from closing it.  Hence sc->sc_parent->dk_rawopens > 0.

On open, sc->sc_parent->dk_rawvp is set to nonnull, and it is only
reset to null on close.  Hence if the parent is still open, as it
must be here, sc->sc_parent->dk_rawvp must be nonnull.

dk(4): Avoid holding dkwedges_lock while allocating array.

This is not great -- we shouldn't be choosing the unit number here
anyway; we should just let autoconf do it for us -- but it's better
than potentially blocking any dk_openlock or dk_rawlock (which are
sometimes held when waiting for dkwedges_lock) for memory allocation.

dk(4): KNF: return (v) -> return v.
No functional change intended.

dk(4): KNF: Whitespace.
No functional change intended.

dk(4): Omit needless void * cast.
No functional change intended.

dk(4): Fix typo in comment: dkstrategy, not dkstragegy.
No functional change intended.

dk(4): ENXIO, not ENODEV, means no such device.
ENXIO is `device not configured', meaning there is no such device.
ENODEV is `operation not supported by device', meaning the device is
there but refuses the operation, like writing to a read-only medium.

Exception: For undefined ioctl commands, it's not ENODEV _or_ ENXIO,
but rather ENOTTY, because why make any of this obvious when you
could make it obscure Unix lore?

dk(4): KNF: Sort includes.
No functional change intended.

dk(4): <sys/rwlock.h> for rwlock(9).

dk(4): Prevent races in access to struct dkwedge_softc::sc_size.
Rules:
1. Only ever increases, never decreases.
   (Decreases require removing and readding the wedge.)
2. Increases are serialized by dk_openlock.
3. Reads can happen unlocked in any context where the softc is valid.

Access is gathered into dkwedge_size* subroutines -- don't touch
sc_size outside these.  For now, we use rwlock(9) to keep the
reasoning simple.  This should be done with atomics on 64-bit
platforms and a seqlock on 32-bit platforms to avoid contention.

However, we can do that in a later change.

dk(4): Move CFDRIVER_DECL and CFATTACH_DECL3_NEW earlier in file.

Follows the pattern of most drivers, and will be necessary for
referencing dk_cd in dk_bdevsw and dk_cdevsw soon, to prevent
open/detach races.
No functional change intended.

dk(4): Don't touch dkwedges or ndkwedges outside dkwedges_lock.

dk(4): Assert parent vp is nonnull before we stash it away.

Let's enable early attribution if this goes wrong.

If it's not the parent's first open, also assert the parent vp is
already nonnull.

dk(4): Assert dkwedges[unit] is the sc we're about to free.

dk(4): Require dk_openlock in dk_set_geometry.

Not strictly necessary but this makes reasoning easier and documents
with an assertion how disk_set_info is serialized.

disk(9): Fix use-after-free race with concurrent disk_set_info.

This can happen with dk(4), which allows wedges to have their size
increased without destroying and recreating the device instance.

Drivers which allow concurrent disk_set_info and disk_ioctl must
serialize disk_set_info with dk_openlock.

dk(4): Add null d_cancel routine to devsw.

This way, dkclose is guaranteed that dkopen, dkread, dkwrite,
dkioctl, &c., have all returned before it runs.  For block opens,
setting d_cancel also guarantees that any buffered writes are flushed
with vinvalbuf before dkclose is called.

dk(4): Fix callout detach race.
1. Set a flag sc_iostop under the lock sc_iolock so dkwedge_detach
   and dkstart don't race over it.
2. Decline to schedule the callout if sc_iostop is set.  The callout
   is already only ever scheduled while the lock is held.
3. Use callout_halt to wait for any concurrent callout to complete.
   At this point, it can't reschedule itself.

Without this change, the callout could be concurrently rescheduling
itself as we issue callout_stop, leading to use-after-free later.

dk(4): Use disk_begindetach and rely on vdevgone to close instances.

The first step is to decide whether we can detach (if forced, yes; if
not forced, only if not already open), and prevent new opens if so.

There's no need to start closing open instances at this point --
we're just making a decision to detach, and preventing new opens by
transitioning state that dkopen will respect[*].

The second step is to force all open instances to close.  This is
done by vdevgone.  By the time vdevgone returns, there can be no open
instances, so if there _were_ any, closing them via vdevgone will
have passed through dklastclose.

After that point, there can be no opens and no I/O operations, so
dk_openmask must already be zero and the bufq must be empty.

Thus, there's no need to have an explicit call to dklastclose (via
dkwedge_cleanup_parent) before or after making the decision to
detach.
[*] Currently access to this state is racy: nothing serializes
    dkwedge_detach's state transition with dkopen's test.  TBD in a
    separate commit shortly.

dk(4): Set .d_cfdriver and .d_devtounit to plug open/detach race.

This way, opening dkN or rdkN will wait if attach or detach is still
in progress, and vdevgone will wake up such pending opens and make
them fail.  So it is no longer possible for a wedge to be detached
after dkopen has already started using it.

For now, we use a custom .d_devtounit function that looks up the
autoconf unit number via the dkwedges array, which conceivably may
use an independent unit numbering system -- nothing guarantees they
match up.  (In practice they will mostly match up, but concurrent
wedge creation could lead to different numbering.)  Eventually this
should be changed so the two numbering systems match, which would let
us delete the new dkunit function and just use dev_minor_unit like
many other drivers can.

dk(4): Take a read-lock on dkwedges_lock if we're only reading.
- dkwedge_find_by_name
- dkwedge_find_by_parent
- dkwedge_print_wnames

dk(4): Omit needless locking in dksize, dkdump.

All the members these use are stable after initialization, except for
the wedge size, which dkwedge_size safely reads a snapshot of without
locking in the caller.

dk(4): dkdump: Simplify.  No functional change intended.

dk(4): Narrow the scope of the device numbering lookup on detach.

Just need it for vdevgone, order relative to other things in detach
doesn't matter.
No functional change intended.

disk(9): Fix missing unlock in error branch in previous change.

dk(4): Fix racy access to sc->sc_dk.dk_openmask in dkwedge_delall1.
Need sc->sc_parent->dk_rawlock for this, as used in dkopen/dkclose.

dk(4): Convert tests to assertions in various devsw operations.
.d_cancel, .d_strategy, .d_read, .d_write, .d_ioctl, and .d_discard
are only ever used between successful .d_open return and entry to
.d_close.  .d_open doesn't return until sc is nonnull and sc_state is
RUNNING, and dkwedge_detach waits for the last .d_close before
setting sc_state to DEAD.  So there is no possibility for sc to be
null or for sc_state to be anything other than RUNNING or DYING.

There is a small functional change here but only in the event of a
race: in the short window between when dkwedge_detach is entered, and
when .d_close runs, any I/O operations (read, write, ioctl, &c.) may
be issued that would have failed with ENXIO before.

This shouldn't matter for anything: disk I/O operations are supposed
to complete reasonably promptly, and these operations _could_ have
begun milliseconds prior, before dkwedge_detach was entered, so it's
not a significant distinction.

Notes:
- .d_open must still contend with trying to open a nonexistent wedge,
  of course.
- .d_close must also contend with closing a nonexistent wedge, in
  case there were two calls to open in quick succession and the first
  failed while the second hadn't yet determined it would fail.
- .d_size and .d_dump are used from ddb without any open/close.

dk(4): Fix lock assertion in size increase: parent's, not wedge's.

dk(4): Rename label for consistency.  No functional change intended.

dk(4): dkclose must handle a dying wedge too to close the parent.

Otherwise the parent open leaks on detach (or revoke) when the wedge
was open and had to be forcibly closed.

Fixes assertion sc->sc_dk.dk_openmask == 0.
ioctl(DIOCRMWEDGES): Delete only idle wedges.

Don't forcibly delete busy wedges.

Fixes accidental destruction of the busy wedge that the root file
system is mounted on, triggered by syzbot's ioctl(DIOCRMWEDGES).

dk(4): Omit needless sc_iopend, sc_dkdrn mechanism.
vdevgone guarantees that all instances are closed by the time it
returns, which in turn guarantees all I/O operations (read, write,
ioctl, &c.) have completed, and, if the block device is open,
vinvalbuf(V_SAVE) -> vflushbuf has completed, which forces all
buffered transfers to be issued and waits for them to complete.
So by the time vdevgone returns, no further transfers can be
submitted and the bufq must be empty.

dk(4): Fix typo: sc_state, not sc_satte.

Had tested a patch series, but not every patch in it, and I
inadvertently fixed the typo in a later patch in the series, not in
the one I committed.

dk(4): Make it clearer that dkopen EROFS branch doesn't leak.
It looked like we may need to sometimes call dklastclose in error
branch for the case of (flags & ~sc->sc_mode & FWRITE) != 0, but it
is not actually possible to reach that case: if the caller requested
read/write, and the parent is read-only, and it is the first time
we've opened the parent, then dkfirstopen will fail with EROFS so we
never get there.

But this is confusing and it looked like the error branch is wrong,
so let's rearrange the conditional to make it clearer that we cannot
goto out after dkfirstopen has succeeded.  And then assert that the
case cannot happen when we do call dkfirstopen.

dk(4): Need pdk->dk_openlock to read pdk->dk_wedges.


To generate a diff of this commit:
cvs rdiff -u -r1.124 -r1.124.4.1 src/sys/dev/dkwedge/dk.c
cvs rdiff -u -r1.134 -r1.134.4.1 src/sys/kern/subr_disk.c
cvs rdiff -u -r1.77 -r1.77.10.1 src/sys/sys/disk.h

Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.




Home | Main Index | Thread Index | Old Index