Subject: Re: NetBSD, apple fibre-channel card & 2.8TB Xserve-RAID
To: der Mouse <mouse@Rodents.Montreal.QC.CA>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 12/04/2004 00:50:30
[ On Saturday, December 4, 2004 at 00:16:06 (-0500), der Mouse wrote: ]
> Subject: Re: NetBSD, apple fibre-channel card & 2.8TB Xserve-RAID
>
> >> Of course since the real limit in disklabels and FFS is 2^32 sectors
> >> per partition and per filesystem
> 
> I've just run into some rather disturbing symptoms that lead me to
> suspect the actual practical limit may be 1TB - ie, 2^31 sectors (of
> half a K each).

Indeed that does seem to be the limit (err, minus one more actually, at
least on 2's compliment machines)

My fingers jumped too quick to type the "natural" 2^32 number.

>  Is anyone using filesystems over 1TB successfully?

As far as I can tell it is simply not possible on NetBSD, at least not
on any released version.   :-)


> How about with FFSv1?

Did you mean "v2"?

Either way there's still the issue of the disklabel, both in-kernel and
on-disk as far as I can tell at the moment, only being able to represent
2^31-1 sectors, so unless one can use hardware (and drivers) with a
larger sector size, e.g. of 1K or 2K, etc., it's just not possible to
exceed a 1TB limit to filesystem size.

I tried ignoring the fact that the disklabel userland code reported
negative numbers by trying to newfs bigger (1.4TB) filesystems using
2KB, 4KB, and 8KB frags, but nothing worked.  Starting to look for the
signed/unsigned problems just drove me crazy -- it would/will be easier
to just change everything to use uint64_t than to try to find the key
places where things are getting mucked up now.


> we suspect there
> is something going wonky at the 1TB mark, either it's reading/writing
> the wrong sectors, or the buffer cache is getting confused, or some
> such.

At, or beyond, the 2^31-1 sector mark, do you think?

I've not yet seen any problems with a 2^31-1 sector filesystem on 1.6.x
running on alpha, though I've not actually filled the filesystem to
capacity and checked the data yet, nor even done really that much
extensive work with it yet, just a few simple single-user tests....  ;-)

Since I have two such identical logical devices I suppose I could fill
one with many big files full of random data, then copy it across to the
other via the filesystem interface (e.g. cp -R), and then compare them
(e.g. with "rsync -n").

>  The most likely candidate to my mind is some kind of 32-/64-bit
> bug, possibly sign-extending when it should be zero-extending, or maybe
> using a 32-bit datatype (maybe inadvertently) where a 64-bit type is
> called for.

Perhaps such a bug is avoided on alpha and other LP64 systems?

> But much of the weirdness I recently reported on tech-kern with
> directories that appear fine to fsck but the kernel acts bizarre with
> can be rather neatly explained by assuming such bugs in the block
> device code paths, or perhaps the filesystem disk-interface code paths.

If you can suggest any way to reproduce such weirdness I can certainly
try to do that to the alpha es40 I'm preparing -- or give you access for
you to try....  :-)


-- 
						Greg A. Woods

+1 416 218-0098                  VE3TCP            RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>