tech-kern: Read from CD-ROM hangs in uvn

Subject: Read from CD-ROM hangs in uvn_fp2...
To: None <tech-kern@netbsd.org>
From: gabriel rosenkoetter <gr@eclipsed.net>
List: tech-kern
Date: 07/03/2001 00:08:21
I've been having problems with processes reading from these two
different SCSI CD-ROM (an Plextor CD-RW and an Apple/Matshita
CD-ROM) drives:

cd0 at scsibus0 target 1 lun 0: <PLEXTOR, CD-R   PX-W1210S, 1.01> SCSI2 5/cdrom removable
cd0: sync (100.0ns offset 15), 8-bit (10.000MB/s) transfers
cd1 at scsibus0 target 3 lun 0: <MATSHITA, CD-ROM CR-8012, 1.0g> SCSI2 5/cdrom removable
cd1: sync (100.0ns offset 8), 8-bit (10.000MB/s) transfers

connected to this Adaptec 2940 (or 2940U? I actually don't remember
which I bought...) SCSI card:

ahc0 at pci0 dev 17 function 0
ahc0: interrupting at irq 15
ahc0: aic7880 Single Channel A, SCSI Id=7, 16/255 SCBs
scsibus0 at ahc0: 8 targets, 8 luns per target

hanging unkillably in uvn_fp2. I want to make sure this isn't
already a known problem or even fixed in a more recent kernel than
mine (1.5W from around 6/21/2001) before I file a PR on it. Stuff
in parentheses below is at best an educated guess and at worst
completely unfounded speculation. (I haven't done any real
investigation yet as it doesn't bite me all that often, though I
could easily do some tinkering.)

The process that actually hangs responds to no signals, including
SIGKILL. (Because it's in the act of handling a signal already, so
new ones are queued till the signal handler exits, which it never
does?)

Any other process that tries to interact with the device also hangs,
and refuses to respond to much (not SIGINT or SIGTERM), but will die
on SIGKILL. (Because they're waiting for the first process to
release access to the device?)

When the first process hangs, I get a lot of:

uvn_flush: obj=0xdc6849e8, offset=0x353000.  error 45
uvn_flush: WARNING: changes to page may be lost!

with a progression of obj IDs and offsets spewed to the console.
(I have PCIVERBOSE and SCSIVERBOSE set in my kernel config, but
neither PCI_CONFIG_DUMP nor PCIBIOSVERBOSE, though I could if they
would maybe yield a bit more helpful information.)

The really painful part of all this is that it's completely
impossible to unmount the file system the process is trying to read
from or even to run sync successfully (umount and sync fall into the
"other process" category above, clearly), which means that all file
systems, including those on more volatile file systems (like, say,
*writeable* disks) don't necessarily get sync()ed properly on halt
and reboot (actually, the machine neither halts nor reboots, but has
to be manually kicked in the power supply), and must all be fsck()ed
on the next boot. Sometimes to the clearing of inodes, which always
spooks me a bit.

Most recently I ran into a really nasty situation by doing a tar
cf - /mnt/some_dir | ( cd /dir/on/ide/disk && tar xpf - ) to move
a large number of files off of a CD-R where trying to access either
/dev/cd0d or the directory into which tar was trying to write (other
directories are fine; I'm playing mp3s off the ide disk in question
as I type) caused the "other process" behavior above.

This hanging behavior is by no means limited to tar, of course.
I've seen it mostly with mpg123 and xmms (guess what I keep a lot
of on CD-Rs), but just about anything dealing with a lot of reads
and writes on an ISO-9660 file system runs into it.

Is this because of UBC handling the ISO file system on the CD-R?

Is it something that is just a result of UBC still being under
development?

Is there some tuning I can do in my kernel config or with sysctl
that will maybe make it crop up less frequently? (It's not reliably
reproducible, but I can make it happen pretty easily by just playing
mp3s off a CD-R for about forty minutes, so getting some db
information or such is a possibility.)

-- 
       ~ g r @ eclipsed.net