current-users: Re: File system

Subject: Re: File system > 1 Terabyte
To: Jason Thorpe <thorpej@shagadelic.org>
From: Greg A. Woods <woods@weird.com>
List: current-users
Date: 10/01/2004 18:45:18
[ On Thursday, September 30, 2004 at 14:35:06 (-0700), Jason Thorpe wrote: ]
> Subject: Re: File system > 1 Terabyte
>
> But because of the current disklabel issues, you're stuck at 2TB max.

OK, so now that I have a real example or two to play with (two 1.4TB
arrays on a maxed out Xserve RAID connected to an Alpha ES40), I'm
still/more confused.

Disklabel (the program) apparently accounts for sectors in signed 32-bit
integers, even on the alpha, and refuses to read a label specification
that gives a device more than 2^32-1 sectors:

	type: SCSI
	disk: Xserve RAID
	label: fictitious
	flags:
	bytes/sector: 512
	sectors/track: 128
	tracks/cylinder: 128
	sectors/cylinder: 16384
	cylinders: 179526
	total sectors: 2941353984
	rpm: 3600
	interleave: 1
	trackskew: 0
	cylinderskew: 0
	headswitch: 0           # microseconds
	track-to-track seek: 0  # microseconds
	drivedata: 0 
	
	3 partitions:
	#        size    offset     fstype  [fsize bsize cpg/sgs]
	 c: -1353613312         0     unused      0     0         # (Cyl.    0 - 179525)

	[console]<@> # disklabel -e sd2
	[[ .... ]]
	disklabel: line 11: bad total sectors: 2941353984
	disklabel: line 22: bad partition size: -1353613312
	disklabel: line 23: bad partition size: -1353613312


Faking the label out with 2^31-1 (2,147,483,647) sectors ends up with
this working label:

	[console]<@> # disklabel -r sd2         
	# /dev/rsd2c:
	type: SCSI
	disk: Xserve RAID
	label: fictitious
	flags:
	bytes/sector: 512
	sectors/track: 128
	tracks/cylinder: 128
	sectors/cylinder: 16384
	cylinders: 179526
	total sectors: 2147483647
	rpm: 3600
	interleave: 1
	trackskew: 0
	cylinderskew: 0
	headswitch: 0           # microseconds
	track-to-track seek: 0  # microseconds
	drivedata: 0 
	
	3 partitions:
	#        size    offset     fstype  [fsize bsize cpg/sgs]
	 a: 2147483647         0     4.2BSD   2048 16384   160   # (Cyl.    0 - 131071*)
	 c: 2147483647         0     unused      0     0         # (Cyl.    0 - 131071*)

which is of course just 1.0TB, even with the 2k frag size (i.e. no
matter how you cut it :-).

The only way I can see to get to even just 2.0TB is to use a
1024-byte/sector disk -- but can NetBSD-1.6.x use that?  And can the
Xserve RAID do it?

Or is it just disklabel's I/O that's broken?  I was able to create a
label of sorts claiming the whole partition with "disklabel -i -I" but
then newfs barfs:

	[console]<@> # disklabel -r sd3       
	# /dev/rsd3c:
	type: SCSI
	disk: Xserve RAID
	label: fictitious
	flags:
	bytes/sector: 512
	sectors/track: 128
	tracks/cylinder: 128
	sectors/cylinder: 16384
	cylinders: 179526
	total sectors: 2941353984
	rpm: 3600
	interleave: 1
	trackskew: 0
	cylinderskew: 0
	headswitch: 0           # microseconds
	track-to-track seek: 0  # microseconds
	drivedata: 0 
	
	3 partitions:
	#        size    offset     fstype  [fsize bsize cpg/sgs]
	 c: -1353613312         0     4.2BSD      0     0     0   # (Cyl.    0 - 179525)

	[console]<@> # newfs /dev/rsd3c
	preposterous size -1353613312


FYI, newfs on that other device goes as fast as the 9600bps console can
scroll and I end up with:

	[console]<@> # df -ki 
	Filesystem  1K-blocks     Used    Avail %Cap    iUsed  iAvail %iCap Mounted on
	/dev/sd1a     6048498   418944  5327128   7%    10563   746683   1% /
	/dev/sd1d     9439234      378  8966894   0%      126  1180288   0% /var
	/dev/sd2a    1073508690        2 1019833252   0%        1   621437   0% /mnt

I'll probably need a few more inodes, esp. for a mail partition....  :-)

Oh....

	$ expr 1073508690 - 1019833252
	53675438

wow.

BTW, what was that trick to avoiding so much "wastage" again?  Even
fewer cylinder groups (and more inodes/g)?

-- 
						Greg A. Woods

+1 416 218-0098                  VE3TCP            RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>