tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: strange zfs size allocation data
On Sun, Jul 07, 2024 at 02:07:40PM -0400, Greg Troxel wrote:
> I ran into a test failure with bup, where it was restoring a sparse file
> and trying to validate the resulting disk usage. It turns out that on
> zfs (NetBSD 10), when you write a file, it shows as using 1 block and
> then some seconds later shows as using the right amount.
When you say "validate the resulting disk usage" and "using 1 block" what
do you mean, exactly? If the file is sparse, I can't see how there's any
bug unless the wrong st_size is returned by stat() or the wrong length
returned by lseek().
du counts allocated blocks as reported by stat(). A sparse file might
legitimately report 0, 1, or any other value, even values that exceed
(st_size / st_blksize). And the number of allocated blocks can absolutely
change even while st_size stays the same - consider a filesystem with
background deduplication or compression, both of which some variants of
ZFS have, but ZFS is not the only filesystem with these features.
If bup is relying on some particular block allocation behavior, that seems
like a bug.
>
> So:
>
> - why is it happening?
> - is this a bug?
> - if we think it's a bug, is it feasible to fix?
>
> A simple program to create files, n empty megabytes followed by 1 real
> megabyte, for n in 0..9. And then to 'du' the file, every 1s for 30s,
> not worrying about precise timing.
>
> I have a big ssd which is mostly a zfs type partition, and a pool with
> just that. Nothing fancy.
>
> ----------------------------------------
> #!/bin/sh
>
> for i in $(seq 0 9); do
> OUT=seek$i
> rm -rf ${OUT} ${OUT}.size
> dd if=/dev/urandom seek=$i bs=1m count=1 of=${OUT} 2> /dev/null
>
> for s in $(seq 0 30); do
> (echo -n "$s: "; du ${OUT}) >> $OUT.size
> sleep 1
> done
>
> done
> ----------------------------------------
>
> leads to ('head -6' shown, since that's sufficient to understand):
>
> ==> seek0.size <==
> 0: 1 seek0
> 1: 1 seek0
> 2: 1 seek0
> 3: 1027 seek0
> 4: 1027 seek0
> 5: 1027 seek0
>
> ==> seek1.size <==
> 0: 1 seek1
> 1: 1 seek1
> 2: 1027 seek1
> 3: 1027 seek1
> 4: 1027 seek1
> 5: 1027 seek1
>
> ==> seek2.size <==
> 0: 1 seek2
> 1: 1027 seek2
> 2: 1027 seek2
> 3: 1027 seek2
> 4: 1027 seek2
> 5: 1027 seek2
>
> ==> seek3.size <==
> 0: 1 seek3
> 1: 1 seek3
> 2: 1 seek3
> 3: 1 seek3
> 4: 1027 seek3
> 5: 1027 seek3
>
> ==> seek4.size <==
> 0: 1 seek4
> 1: 1 seek4
> 2: 1 seek4
> 3: 1027 seek4
> 4: 1027 seek4
> 5: 1027 seek4
>
> ==> seek5.size <==
> 0: 1 seek5
> 1: 1 seek5
> 2: 1027 seek5
> 3: 1027 seek5
> 4: 1027 seek5
> 5: 1027 seek5
>
> ==> seek6.size <==
> 0: 1 seek6
> 1: 1 seek6
> 2: 1 seek6
> 3: 1 seek6
> 4: 1 seek6
> 5: 1027 seek6
>
> ==> seek7.size <==
> 0: 1 seek7
> 1: 1 seek7
> 2: 1 seek7
> 3: 1 seek7
> 4: 1027 seek7
> 5: 1027 seek7
>
> ==> seek8.size <==
> 0: 1 seek8
> 1: 1 seek8
> 2: 1 seek8
> 3: 1027 seek8
> 4: 1027 seek8
> 5: 1027 seek8
>
> ==> seek9.size <==
> 0: 1 seek9
> 1: 1027 seek9
> 2: 1027 seek9
> 3: 1027 seek9
> 4: 1027 seek9
> 5: 1027 seek9
--
Thor Lancelot Simon tls%panix.com@localhost
"The inconsistency is startling, though admittedly, if consistency is to
be abandoned or transcended, there is no problem." - Noam Chomsky
Home |
Main Index |
Thread Index |
Old Index