tech-misc archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: du -m is off




On 22-Mar-08, at 11:00 AM, Jeremy C. Reed wrote:
I noticed that a file of 10485760 bytes would be listed as 11MB by du -m.
The -h "humanize" is correct and lists as 10MB.

du.c uses:

                (void)printf("%lld\t%s\n",
                    (long long)howmany(blocks, (int64_t)blocksize),

This works better for me:

printf("%ld\n", (long)((int64_t)blocks / blocksize));

But that doesn't help for files less than a megabyte which will then be
displayed as a zero.

As another example, a file of 3372684 bytes is 3.21644MB. du -m would
round up to 4. While my printf above prints "3".

I am not sure what correct behaviour should be.

The first hint is to always think about "blocks", not bytes.

I think there's an argument to be made that when dealing with units of allocation (and available space) the rounding should always be done in such a way that anything over goes to the next increment (in the case of counting allocated units, and of course vice versa in the case of free units: anything under goes to the next decrement). This is effectively the same way df portrays things when showing blocks free way back when filesystems were simple and you simply couldn't squeeze another one-byte file onto a filesystem with 511 bytes free no matter how hard you tried. 511 bytes is 99.8% of a 512 block and in normal arithmetic so the free block count would normally be rounded up (to 1) if there's 511 bytes free. I think the same logic should apply no matter what units df is displaying its counts in. If there's less than 1MB free and we're displaying in terms of MB then there are zero MB free, for example.

Similarly for du: files of less than 1MB, even if they're only one byte long, should be shown as using one unit of display-sized units and so for 'du -m' the one-byte file should show as using 1MB. Same for that 3372684-byte file -- it should indeed show as 4MB with "du - m", even without considering any finer details.

Now as to the problem with your 10485760-byte file, well that's easy to understand if you first take a peek at just how much space has been allocated for that file (and keep in mind that "du" shows disk usage, not file size) [use "stat -s"]. For example when I create an exactly 10MB file I get a file that uses 20512 "blocks" (i.e. 512- byte-blocks) of disk space. If you multiply that back out then the bytes of disk space used are 10502144, i.e. 10.015625MB, and so by the logic above that means that in 1MB units the file does indeed use what must be counted as 11MB of disk space. This is of course because the block size on the filesystem is 16KB, not 512B, and your 10MB file actually needs 623 filesystem(16KB) units of allocation, i.e. it does actually use more than 10MB of filesystem space.

That could also suggest that "du -h" is in some ways wrong too, at least with respect to the intent of showing disk usage (as opposed to file size) in hard display-sized units. I'm not sure if it can or should be fixed though, and after all it is just a more easily human interpretable measure of usage and by definition it always glosses over the finer details.

--
                                        Greg A. Woods; Planix, Inc.
                                        <woods%planix.ca@localhost>





Home | Main Index | Thread Index | Old Index