tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: ffs snapshots patch
On Sat, Apr 16, 2011 at 09:29:26PM +0200, Manuel Bouyer wrote:
> Hello,
> attached is a work in progress on ffs snapshot (as it's work in progress,
> some debug and instrumentation code is still present in the
> patch, no need to comment on this part :).
> The start of this work is that when working on quota, I noticed that
> taking a snapshot on a 500Gb filesystem needs several minutes, and is
> O(n) with the number of persisent snapshots.
> Here's some timings on a otherwise idle 500Gb filesystem (it's some brand of
> SATA2 3.5" drive attached to a AHCI controller, so it's a reasonable test
> bed for today):
> java# /usr/bin/time fssconfig fss0 /home /home/snaps/snap0
> 260.53 real 0.00 user 1.15 sys
> /home: suspended 77.873 sec, redo 1184 of 2556
> java# /usr/bin/time fssconfig fss1 /home /home/snaps/snap1
> 377.87 real 0.00 user 2.53 sys
> /home: suspended 206.078 sec, redo 1184 of 2556
> java# /usr/bin/time fssconfig fss2 /home /home/snaps/snap2
> 508.23 real 0.00 user 4.28 sys
> /home: suspended 338.534 sec, redo 1184 of 2556
> java# /usr/bin/time fssconfig fss3 /home /home/snaps/snap3
> 621.40 real 0.00 user 5.50 sys
> /home: suspended 431.154 sec, redo 1183 of 2556
>
> suspending a filesystem for more than 7mn to take a snapshot makes
> persisent snapshot quite useless to me. I wonder how it would behaves
> on a multi-terabyte filesystem.
>
> I looked at where the time is spend and found 2 major issues:
> 1 cgaccount() works in 2 pass: first it copies cg before suspending the
> filesystem; then it is called again to copy only the cg that have been
> modified between copy and filesystem suspend.
> The problem is that to copy a cg we need to allocate blocks for the snapshot
> file, which may be in a cg we just copied. This is the cause of the high
> number of cg copies (almost half of them) with the filesystem suspended.
>
> 2 while the filesystem is suspended, we want to expunge the snapshot files
> from the snapshot view (make them appear as a 0-length file).
> With ~500GB sparse files this is a lot of work.
>
> I fixed 1) by preallocating needed blocks snapshot_setup().
Good catch. Committed.
> Fixing 2) is trickier. To avoid the heavy writes to the snapshot file
> with the fs suspended, the snapshot appears with its real lenght and
> blocks at the time of creation, but is marked invalid (only the
> inode block needs to be copied, and this can be done before suspending
> the fs). Now BLK_SNAP should never be seen as a block number, and we skip
> ffs_copyonwrite() if the write is to a snapshot inode.
I strongly object here. There are good reasons to expunge old snapshots.
Even it it were done right, without deadlocks and locking-against-self,
the resulting snapshot looses at least two properties:
- A snapshot is considered stable. Whenever you read a block you get
the same contents. Allowing old snapshots to exist but not running
copy-on-write means these blocks will change their contents.
- A snapshot will fsck clean. It is impossible to change fsck_ffs
to check a snapshot as these old snapshots indirect blocks now will
contain garbage.
You cannot copy blocks before suspension without rewriting them once
the file system is suspended.
The check in ffs_copyonwrite() will only work as long as the old
snapshot exists. As sson as it gets removed we will run COW
on the blocks used by the old snapshot.
> With these changes the times are much more reasonable:
> /usr/bin/time fssconfig fss0 /home /home/snaps/snap0
> 299.68 real 0.00 user 1.10 sys
> /home: suspended 0.310 sec, redo 0 of 2556
> /usr/bin/time fssconfig fss1 /home /home/snaps/snap1
> 188.10 real 0.00 user 0.86 sys
> /home: suspended 0.270 sec, redo 0 of 2556
> /usr/bin/time fssconfig fss2 /home /home/snaps/snap2
> 169.78 real 0.00 user 0.95 sys
> /home: suspended 0.450 sec, redo 0 of 2556
> /usr/bin/time fssconfig fss3 /home /home/snaps/snap3
> 172.39 real 0.00 user 0.99 sys
> /home: suspended 0.300 sec, redo 0 of 2556
>
> This seems to work; one issue with this patch is that the block
> count for the snapshot inode, and block summary informations (the
> second being probably a consequence of the first) appear wrong when
> running fsck against a snapshot. I believe this is fixable, but
> I've not yet found from where the information mismatch is coming from.
>
> comments ?
>
> PS: I'm away from computers for one week, so don't expect replies to
> your comments before next sunday.
>
> --
> Manuel Bouyer <bouyer%antioche.eu.org@localhost>
> NetBSD: 26 ans d'experience feront toujours la difference
> --
--
Juergen Hannken-Illjes - hannken%eis.cs.tu-bs.de@localhost - TU Braunschweig
(Germany)
Home |
Main Index |
Thread Index |
Old Index