tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: strange zfs size allocation data



On Sun, Jul 07, 2024 at 02:07:40PM -0400, Greg Troxel wrote:
> I ran into a test failure with bup, where it was restoring a sparse file
> and trying to validate the resulting disk usage.  It turns out that on
> zfs (NetBSD 10), when you write a file, it shows as using 1 block and
> then some seconds later shows as using the right amount.

When you say "validate the resulting disk usage" and "using 1 block" what
do you mean, exactly?  If the file is sparse, I can't see how there's any
bug unless the wrong st_size is returned by stat() or the wrong length
returned by lseek().

du counts allocated blocks as reported by stat().  A sparse file might
legitimately report 0, 1, or any other value, even values that exceed
(st_size / st_blksize).  And the number of allocated blocks can absolutely
change even while st_size stays the same - consider a filesystem with
background deduplication or compression, both of which some variants of
ZFS have, but ZFS is not the only filesystem with these features.

If bup is relying on some particular block allocation behavior, that seems
like a bug.

> 
> So:
> 
>   - why is it happening?
>   - is this a bug?
>   - if we think it's a bug, is it feasible to fix?
> 
> A simple program to create files, n empty megabytes followed by 1 real
> megabyte, for n in 0..9.  And then to 'du' the file, every 1s for 30s,
> not worrying about precise timing.
> 
> I have a big ssd which is mostly a zfs type partition, and a pool with
> just that.  Nothing fancy.
> 
> ----------------------------------------
> #!/bin/sh
> 
> for i in $(seq 0 9); do
>     OUT=seek$i
>     rm -rf ${OUT} ${OUT}.size
>     dd if=/dev/urandom seek=$i bs=1m count=1 of=${OUT} 2> /dev/null
> 
>     for s in $(seq 0 30); do
> 	(echo -n "$s:	"; du ${OUT}) >> $OUT.size
> 	sleep 1
>     done
> 
> done
> ----------------------------------------
> 
> leads to ('head -6' shown, since that's sufficient to understand):
> 
> ==> seek0.size <==
> 0:	1	seek0
> 1:	1	seek0
> 2:	1	seek0
> 3:	1027	seek0
> 4:	1027	seek0
> 5:	1027	seek0
> 
> ==> seek1.size <==
> 0:	1	seek1
> 1:	1	seek1
> 2:	1027	seek1
> 3:	1027	seek1
> 4:	1027	seek1
> 5:	1027	seek1
> 
> ==> seek2.size <==
> 0:	1	seek2
> 1:	1027	seek2
> 2:	1027	seek2
> 3:	1027	seek2
> 4:	1027	seek2
> 5:	1027	seek2
> 
> ==> seek3.size <==
> 0:	1	seek3
> 1:	1	seek3
> 2:	1	seek3
> 3:	1	seek3
> 4:	1027	seek3
> 5:	1027	seek3
> 
> ==> seek4.size <==
> 0:	1	seek4
> 1:	1	seek4
> 2:	1	seek4
> 3:	1027	seek4
> 4:	1027	seek4
> 5:	1027	seek4
> 
> ==> seek5.size <==
> 0:	1	seek5
> 1:	1	seek5
> 2:	1027	seek5
> 3:	1027	seek5
> 4:	1027	seek5
> 5:	1027	seek5
> 
> ==> seek6.size <==
> 0:	1	seek6
> 1:	1	seek6
> 2:	1	seek6
> 3:	1	seek6
> 4:	1	seek6
> 5:	1027	seek6
> 
> ==> seek7.size <==
> 0:	1	seek7
> 1:	1	seek7
> 2:	1	seek7
> 3:	1	seek7
> 4:	1027	seek7
> 5:	1027	seek7
> 
> ==> seek8.size <==
> 0:	1	seek8
> 1:	1	seek8
> 2:	1	seek8
> 3:	1027	seek8
> 4:	1027	seek8
> 5:	1027	seek8
> 
> ==> seek9.size <==
> 0:	1	seek9
> 1:	1027	seek9
> 2:	1027	seek9
> 3:	1027	seek9
> 4:	1027	seek9
> 5:	1027	seek9

-- 
  Thor Lancelot Simon	                                     tls%panix.com@localhost

  "The inconsistency is startling, though admittedly, if consistency is to
   be abandoned or transcended, there is no problem."	      - Noam Chomsky


Home | Main Index | Thread Index | Old Index