Subject: kern/36716: cd(4) problem with transfers exceeding 65535 bytes
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <rumble@ephemeral.org>
List: netbsd-bugs
Date: 08/01/2007 07:35:00
>Number:         36716
>Category:       kern
>Synopsis:       cd(4) problem with transfers exceeding 65535 bytes
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Aug 01 07:35:00 +0000 2007
>Originator:     Stephen M. Rumble
>Release:        4.99.25
>Organization:
>Environment:
NetBSD x31.ephemeral.org 4.99.25 NetBSD 4.99.25 (X31) #1: Tue Jul 31 22:36:09 EDT 2007  rumble@x31.ephemeral.org:/usr/src/sys/arch/i386/compile/X31 i386
>Description:
When reading an EFS file system from CD, cd(4) will often start transfers greater than 65535 bytes. This is due to the fact that bounce buffering often rounds accesses to 33 sectors (66k). The following errors result:

On my Thinkpad X31:

    piixide0:1: unable to load xfer table DMA map for drive 0, error=22
    piixide0:1:0: lost interrupt
        type: atapi tc_bcount: 67584 tc_skip: 0

    ... repeats indefinitely ...

In QEMU (0.8.2nb2):

    piixide0:1:0 lost interrupt
        type: atapi tc_bcount: 0 tc_skip: 67584

    ... repeats indefinitely ...

I have no idea where error gets set to 22 in the X31 case, as that doesn't appear to be a return value of bus_dmamem_map. The DMA map error comes from sys/dev/pci/pciide_common.c:567. The lost interrupt error comes from sys/dev/ic/wdc.c:1324.

I've managed to work around the problem is two different ways.

The first alters dev/scsipi/cd.c and limits bounce buffers to a maximum of MAXPHYS bytes. Any additional I/O is done in another buf. A patch can be found here:
    http://mail-index.netbsd.org/tech-kern/2007/07/28/0005.html

The second workaround alters dev/scsipi/atapi_wdc.c line 598 to read as follows:
    xfer->c_bcount <= 0xfffd ? xfer->c_bcount : 0xfffd,

Presently transfers are limited to 0xffff (the highest value representable in the cylinder registers). Setting it to 0xfffd or lower results in normal function. 0xfffe (which, if I understand the ATAPI docs correctly is the same as 0xffff) results in the same error.

I don't understand how either of the workarounds succeed. It seems clear that the problem has to do with exceeding 0xffff somewhere, but whether this has to do with MAXPHYS, or the cylinder register limitations, I do not know.

The first workaround simply limits the size of the request from the scsipi layer, but the individual device reads are still potentially as many as 0xffff bytes. The second gets a larger request from scsipi, but limits the individual device reads to <= 0xfffd bytes. Both work independently, which seems especially odd.
>How-To-Repeat:
It should be easily repeatable with an atapi cdrom when used with a disklabel where d_secsize != 2048.
>Fix:
Workarounds exist, but the true cause of the errors are unknown.