kern/50725: ffs -o discard crash/unmount safety issues

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/50725: ffs -o discard crash/unmount safety issues
From: dholland%eecs.harvard.edu@localhost
Date: Fri, 29 Jan 2016 18:10:00 +0000 (UTC)

>Number:         50725
>Category:       kern
>Synopsis:       ffs -o discard crash/unmount safety issues
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jan 29 18:10:00 +0000 2016
>Originator:     David A. Holland
>Release:        NetBSD 7.99.25 (20151222)
>Organization:
>Environment:
System: NetBSD macaran 7.99.25 NetBSD 7.99.25 (MACARAN) #34: Tue Dec 22 23:55:33 EST 2015 dholland@macaran:/usr/src/sys/arch/amd64/compile/MACARAN amd64
Architecture: x86_64
Machine: amd64
>Description:

When mounted with -o discard, freed blocks are discarded in the
background. They aren't actually marked free in the block bitmap until
the discard is completed. (This is necessary because if such a block
is reallocated and written to before the discard is done, it'll
potentailly be lost.)

riz@ found today that if you discard a lot (he deleted ~3G) the
background discarding happens slowly enough that you can sit and wait
and watch the free space count increase.

This is partly because there's no logic for coalescing the discard
requests (more on that in a second PR) but what happens if the system
is shut down or crashes while the discards are still being processed?

Some quick inspection of the code reveals the following problems:

 * sync does not wait for pending discards to complete.
 * unmount does, but if the discards take more than five seconds it
   times out, prints a warning, and plows ahead. Given the observed
   behavior this is not long enough.

Therefore, if you unmount while there are discards pending and it
takes more than five seconds, the blocks not processed will not be
freed; and after unmount the volume will be marked clean, so those
blocks will disappear until you do a full fsck -f.

Crashing is not as big a problem as the fsck after crashing should
repair things. (However, does fsck do fdiscard on blocks it releases?
I bet not.) I don't remember offhand if traditional fsck will fail in
preen mode if it finds unreferenced blocks, but if it does that's a
further problem for unattended crashes.

>How-To-Repeat:

As above.

>Fix:

Increasing the unmount timeout is easy, it's on line 1643 of
ffs_alloc.c in ffs_discard_finish(). Probably it should wait five
seconds, print one warning, and then wait some substantially longer
time before giving up.

Having sync wait for discards to finish should also be fairly
straightforward; the logic at the top of ffs_discard_finish can be
shared (provided one is careful about races if more than one such wait
is tried at once, and that the cv_signal in ffs_discardcb is changed
to a broadcast) and it's only necessary to have the fs-level sync op
call that logic.

If in the case of a crash fsck fails afterwards that will be
substantially harder to deal with but I don't think it's the case.

This is without wapbl. With wapbl all the above is still true, except
that because crashing doesn't result in fsck and (AFAIK) nothing is
done to register the pending deletions on disk, the unprocessed blocks
will disappear until a full fsck -f is done.

Prev by Date: Re: kern/50453
Next by Date: kern/50726: ffs -o discard is slow
Previous by Thread: PR/50717 CVS commit: src
Next by Thread: kern/50726: ffs -o discard is slow
Indexes:

Home | Main Index | Thread Index | Old Index