tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: WAPBL fix for deallocation exhaustion + slow file removal



   Date: Sat, 1 Oct 2016 16:19:33 +0200
   From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek%gmail.com@localhost>

   attached patch contains a fix to WAPBL deallocation structure
   exhaustion and panic (kern/47146), and avoids need to do slow partial
   truncates in loop, fixing kern/49175.

   [...]

   I plan to commit this sometime tomorrow.

Thanks for taking a shot at this!  But I think it needs a little more
time for review -- certainly I can't digest it in the 24 hours you're
giving.

From a quick glance at the patch, I see one bug immediately in
vfs_wapbl.c that must have been introduced in a recent change:
pool_get(PR_WAITOK) is forbidden while holding a mutex, but
wapbl_register_deallocation does just that.

What happens if wapbl_register_deallocation fails in the middle of
ffs_truncate, and then the power goes out before the caller retries
the truncation?  It looks like we may end up committing a transaction
to disk that partially zeros a file -- and it is not immediately clear
to me who will be responsible for freeing the remaining blocks when
the system comes back up.

I don't understand why the possible failure in the fragment extension
code should not be worrying.  `Only 7 times during a file's lifetime'
says to me that if I check out and delete the NetBSD CVS repository
once, there are about two million times this can happen.  If we can't
restart the transaction safely, we need to find another way to do it.

Can you say anything about alternatives you might have considered that
don't entail adding undo/restart logic to every user of UFS_TRUNCATE
and why they were unfit?


Home | Main Index | Thread Index | Old Index