Subject: Fwd: Re: port-i386/1238: Reading past end of IDE disk causes EIO, not EOF
To: None <current-users@NetBSD.org>
From: D'Arcy J.M. Cain <darcy@NetBSD.org>
List: current-users
Date: 03/08/2003 10:51:25
I am forwarding this message to the gnats list to current-users to see if we 
can get some wider discussion on the subject.  This is another one of those 
dusty PRs that I am hoping to badger people into working on.

Note that the patches suggested here have not even been pulled up to release 
as far as I can tell.  It has been suggested that the problem with the fix is 
that it can prevent a new disklabel from being written to disks in some 
circumstances.  I'm not clear on what those circumstances are.

----------  Forwarded Message  ----------

Subject: Re: port-i386/1238: Reading past end of IDE disk causes EIO, not EOF
Date: Sun, 16 Feb 2003 18:18:03 -0800 (PST)
From: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.ORG
Cc: "D'Arcy J.M. Cain" <darcy@NetBSD.ORG>

Some additional notes about this long-standing bug prompted by hearing
that <darcy@netbsd.org> is currently looking into it...

Although the original bug report said SCSI disks had been tested on an
earlier NetBSD release and found to be working correctly, code
inspection suggests SCSI disks are in fact affected by the bug and
were already affected by the time the bug report was sent (the bug was
introduced simultaneously into the wd and sd driver between the two
releases the tests were run on).

The bug was introduced as a side effect of the following commit to
what is now sys/dev/ata/wd.c:

   revision 1.138
   date: 1995/04/15 05:02:56;  author: mycroft;  state: Exp;  lines: +3 -2
   Don't boundary check I/O to the `raw' partition.

The corresponding sd.c commit is:

   revision 1.68
   date: 1995/04/15 05:01:29;  author: mycroft;  state: Exp;  lines: +3 -2
   Don't boundary check I/O to the `raw' partition.

I'm not sure why this change was made in the first place - perhaps to
avoid getting into chicken-and-egg situations where an incorrect
disklabel makes the disklabel itself out of bounds and impossible to
update, or out of a desire to make the raw device function independent
of the disklabel as a general design principle?

The bug is trivially fixed by reverting the above commits (and I have
been doing that on my own systems for the last five years or so), but
that makes the correct functioning of the raw partition contingent on
having a correct disklabel in place.  This has not caused any problems
on my systems, but is it desirable?  A better solution would be to
bounds check raw partition accesses against the actual size of the
disk obtained during autoconfiguration.  I wonder what other OSes
do...

In addition to sd.c and wd.c, code inspection shows there are several
other disk drivers that don't boundary check I/O to the raw partition
and therefore probably suffer from the bug.  These include the
following:

   raidframe/rf_netbsdkintf.c
   mca/ed_mca.c
   ofw/ofdisk.c
   scsipi/cd.c
   vnd.c

Disk drivers that do boundary check I/O to the raw partition and
therefore probably don't suffer from the bug include:

   isa/fd.c
   qbus/rl.c
   vme/xd.c
   vme/xy.c
   mscp/mscp_disk.c

If nothing else, we should be consistent...
--
Andreas Gustafsson, gson@gson.org

-------------------------------------------------------

-- 
D'Arcy J.M. Cain <darcy@netbsd.org>
http://www.NetBSD.org/