Subject: Re: panic from aha driver after "error loading dma map"..
To: Bill Sommerfeld <sommerfeld@orchard.east-arlington.ma.us>
From: Jason Thorpe <thorpej@nas.nasa.gov>
List: port-i386
Date: 06/16/1997 12:49:31
On Mon, 16 Jun 1997 09:51:19 -0400 
 Bill Sommerfeld <sommerfeld@orchard.east-arlington.ma.us> wrote:

 > aha0: aha_scsi_cmd, error 22 loading dma map

...ok, what this is telling you is that the requested transfer size
was greater than the largest transfer size that can be mapped by that
ccb's dma map.  Now, the ccb dma maps are created to transfer AHA_MAXXFER
bytes... AHA_MAXXFER is defined as:

	((AHA_NSEG - 1) << PGSHIFT)

..which is ((17 - 1) << PGSHIFT) == 65536 == MAXPHYS.  I.e. "that should
never happen".  It might be worth a DIAGNOSTIC check to see where that
comes from... Have you bumped MAXPHYS at all?

...now, looking at the check for error at line 1216 if aha.c, it's
pretty clear that the transfer will be aborted... However, see below...

 > vm_fault(0xf8276000, 0xf8970000, 3, 0)
 > kernel page fault trap, code=0
 > stopped at bcopy+1a:   repe movsl (%esi),%es:(%edi)
 > 
 > "error 22" would appear to be EINVAL.
 > 
 > %edi contained 0xf8970000, which is the address mentioned in the vm_fault..
 > 
 > the traceback was:
 > 	bcopy+1a
 > 	aha_done+0x3f
 > 	aha_finish_ccbs+0xde
 > 	aha_intr+0x73

Now, THIS is interesting... ignoring the fact that ddb seems to have left
out the indirect call to _isa_bus_dmamap_sync(), 

The thing that makes me wonder about this the most is the faulting address...
it's actually a very plausible kernel virtual address!  I just spoke with
Bill on the phone, and he indicated that his backups might have been going
off during this time... this would mean that there would have been tape
i/o going on, which uses the character device... Character device i/o
has the user buffer mapped into kva space (via vmapbuf()) ... this
orignal kva is kept around for sync'ing at interrupt time (i.e. to
perform the bounce from dma buffer to user buffer).

Now, if this ccb were some how used "stale", after a raw transfer,
it could cause that kva to be invalid (i.e. buffer had been vunmapbuf()'d).

However, it's interesting to note that the bcopy is only performed if
the IS_BOUNCING flag is set... That flag _should_ be clear if the dma
map load fails... (see _isa_bus_dmamap_load()).

Hmm, this is somewhat interesting.. and might be related to the
machine check problem that has been reported using the aha driver
on the Alpha... I will take a look..

AH!  And I think I might have an idea... I think we might be getting
interrupted an an unfortunate time... I am going to place a few spl*
calls in a place or two, and I may need you to try and reproduce
the problem... (i.e. run the build again when your tape backups
go off... that's one of the things that's making me suspect being
interrupted during the dma map load...)

Jason R. Thorpe                                       thorpej@nas.nasa.gov
NASA Ames Research Center                               Home: 408.866.1912
NAS: M/S 258-6                                          Work: 415.604.0935
Moffett Field, CA 94035                                Pager: 415.428.6939