Subject: Block Mail
To: None <current-users@sun-lamp.cs.berkeley.edu>
From: Mark Gooderum <gooderum@sctc.com>
List: current-users
Date: 08/26/1994 14:45:09
> From: "Michael L. Hitch" <osymh@gemini.oscs.montana.edu>
> Date: Wed, 24 Aug 1994 20:01:03 -0600
> To: current-users@sun-lamp.cs.berkeley.edu
> Subject: Filesystem block/device block size discrepancy
> 
>   Has anyone been running (or attempted to run) an ffs file system on a
> disk with other than 512 byte sectors?  ...
 
I've had this problem, I posted mail earlier about problems with a SCSI
hard drive with 1024 byte blocks.
 
>   Both the SCSI hard disk (sd*) and CDROM (cd*) drivers take the b_blkno
> field as the block number of a 512-byte (DEV_BSIZE) block and convert it
> to the actual block number of the physical block of the disk....
 
This jives with the behavior I've seen.  Disklabel writes the label
and reads it, okay, but fsck and newfs barf using the raw device about 
1/2 way through the disk (because block #'s are doubled...it runs off
the end of the disk).
 
>   I can see two different ways to correct this:  changing the ffs and
> block special device access to use a 512-byte b_blkno, or change the
> character special device access and the cd9660 file system to use actual
> device block numbers.  I'm not certain which would be the most "correct"
> way to go.
 
It's interesting.  The buf struct has both a bblkno field and an lblkno
field.  The sd driver and physio() don't seem to reference lblkno.
I first played with just making physio() and sd() use real blocks, but
physio() doesn't have immediate access to this info (it could get it
though, but it would be a duplicate of later effort).  Also the wd
driver assumes 512 byte blocks as well (I run a mixed IDE and SCSI
system, call me sick).  So my first change made the fsck of the
IDE drive barf (I don't know if it worked for SCSI, I boot off of IDE..)
 
>   The msdosfs and adosfs file systems could also be potentially
> affected, but at the moment, the adosfs file system will only work with
> 512 byte blocks, and I think the msdosfs file system only works with 512
> byte blocks as well.
 
No, MS-DOS doesn't deal with non-512 byte blocks.  Some very newer drivers
support >512 byte blocks but only at a device driver level by faking out
DOS.  Older drivers just don't support non-512 byte block devices.
 
>   I would be nice to have this fixed in the 1.0 release, but I suspect
> making the changes and fully testing them would not be feasible prior to
> the release [when ever that might be].
> character special device access and the cd9660 file system to use actual
> device block numbers.  I'm not certain which would be the most "correct"
> way to go.
 
It's interesting.  The buf struct has both a bblkno field and an lblkno
field.  The sd driver and physio() don't seem to reference lblkno.
I first played with just making physio() and sd() use real blocks, but
physio() doesn't have immediate access to this info (it could get it
though, but it would be a duplicate of later effort).  Also the wd
driver assumes 512 byte blocks as well (I run a mixed IDE and SCSI
system, call me sick).  So my first change made the fsck of the
IDE drive barf (I don't know if it worked for SCSI, I boot off of IDE..)
 
>   The msdosfs and adosfs file systems could also be potentially
> affected, but at the moment, the adosfs file system will only work with
> 512 byte blocks, and I think the msdosfs file system only works with 512
> byte blocks as well.
 
No, MS-DOS doesn't deal with non-512 byte blocks.  Some very newer drivers
support >512 byte blocks but only at a device driver level by faking out
DOS.  Older drivers just don't support non-512 byte block devices.
 
>   I would be nice to have this fixed in the 1.0 release, but I suspect
> making the changes and fully testing them would not be feasible prior to
> the release [when ever that might be].
 
I would love to help make this work (I've got a couple of older but very
big 1024 block SCSI drives I scrounged that otherwise work).
 
I've played a bit with having physio() stuff the block value into both
b_lblkno and b_blkno.  
 
--- kern_physio.c.orig  Fri Aug 26 11:35:59 1994
+++ kern_physio.c       Fri Aug 26 11:36:05 1994
@@ -141,6 +141,7 @@
             splx(s);
 
             /* [set up the buffer for a maximum-sized transfer] */
+            bp->b_lblkno = btodb(uio->uio_offset);
             bp->b_blkno = btodb(uio->uio_offset);
             bp->b_bcount = iovp->iov_len;
             bp->b_data = iovp->iov_base;
 
The sd driver then says...
 
 
--- sd.c.orig   Fri Aug 26 11:36:53 1994
+++ sd.c        Fri Aug 26 11:37:08 1994
@@ -493,7 +493,17 @@
                 *
                 *  First, translate the block to absolute
                 */
-               blkno = bp->b_blkno / (sd->params.blksize / DEV_BSIZE);
+
+               /*
+                * The FFS and block device callers get this right.
+                * Some callers don't have easy access to the
+                * device info, so they pass down a DEV_BSIZE value in lblkno
+                * as a clue that these are DEV_BSIZE blocks.
+                */
+
+               if (bp->b_lblkno && (sd->params.blksize != DEV_BSIZE)) {
+                   blkno = bp->b_blkno = bp->b_lblkno / 
+                        (sd->params.blksize / DEV_BSIZE);
+               } else {
+                   blkno = bp->b_blkno;
+               }
 
So far it doesn't quite work yet.  I still get the symptom that
I can label the disk, but fsck barfs, and the kernel complains about
no disk label when first booting.

I also haven't made the physio() like changes to the CD code, 
I have no CD drive to test with.

An example of how this can show itself...reading in one chunk from block 0
works because there is no block/offset calculation.  Reading it in 512
byte chunks gives an different result...

Does anyone more in the know want to comment on this approach?  Is there a
better way?

nirvana::mark-109> root dd if=/dev/rsd1d bs=32768 count=1 of=/tmp/dd1
1+0 records in
1+0 records out
32768 bytes transferred in 1 secs (32768 bytes/sec)
nirvana::mark-112> root dd if=/dev/rsd1d bs=512 count=64 of=/tmp/dd2
64+0 records in
64+0 records out
32768 bytes transferred in 1 secs (32768 bytes/sec)
nirvana::mark-113> cmp /tmp/dd1 /tmp/dd2
/tmp/dd1 /tmp/dd2 differ: char 513, line 3


 
P.S.  For those of you who've got my Email address and have sent to
        me personally in the last few weeks @Good.com it finally is
        really working again.  I've got my new provider mostly straightened
        out, mail is 100%, now if I could just get them to route
        my Class C net...
-- 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Mark P. Gooderum                           USSnail:  Good Creations
Senior Consultant - Operating Systems Group          3029 Blackstone Ave. So.
  "Working hard to be hardly working..."             St. Louis Park, MN 55416
EMail:       mark@Good.com                 Voice:    (612) 922-3953
Interactive: mark@nirvana.Good.com         Fax:      (612) 922-2676
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

------------------------------------------------------------------------------