Subject: kern/37456: ahc driver can lead to crash if cdrom drive misbehaves
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Pavel Cahyna <pcah8322@artax.karlin.mff.cuni.cz>
List: netbsd-bugs
Date: 11/30/2007 21:45:00
>Number:         37456
>Category:       kern
>Synopsis:       ahc driver can lead to crash if cdrom drive misbehaves
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Nov 30 21:45:00 +0000 2007
>Originator:     Pavel Cahyna
>Release:        NetBSD 4.0_RC3
>Organization:
>Environment:
System: NetBSD beta 4.0_RC3 NetBSD 4.0_RC3 (164LX) #7: Sat Oct 27 18:15:56 CEST 2007 pavel@beta:/mnt/obj/alpha-4/164LX alpha
Architecture: alpha
Machine: alpha
>Description:
I have tried to read a CD in a CD drive connected to an aic7899
Adaptec HBA. The CD was burned on a fast drive and my drive did not
like it apparently. Here's what happened:
ahc0:SCB 0x3e - timed out
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
ahc0: Dumping Card State while idle, at SEQADDR 0x9
Card was paused
ACCUM = 0x4, SINDEX = 0x57, DINDEX = 0x26, ARG_2 = 0xf
HCNT = 0x0 SCBPTR = 0x0
SCSIPHASE[0x0] SCSISIGI[0x0] ERROR[0x0] SCSIBUSL[0x0] 
LASTPHASE[0x1] SCSISEQ[0x12] SBLKCTL[0x6] SCSIRATE[0x0] 
SEQCTL[0x10] SEQ_FLAGS[0xc0] SSTAT0[0x0] SSTAT1[0x8] 
SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8] SIMODE1[0xa4] 
SXFRCTL0[0x80] DFCNTRL[0x0] DFSTATUS[0x89] 
STACK: 0x0 0x16b 0x180 0x3
SCB count = 64
Kernel NEXTQSCB = 63
Card NEXTQSCB = 63
QINFIFO entries: 
Waiting Queue entries: 
Disconnected Queue entries: 0:62 
QOUTFIFO entries: 
Sequencer Free SCB List: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Sequencer SCB Info: 
  0 SCB_CONTROL[0x44] 
SCB_SCSIID[0x57] SCB_LUN[0x0] SCB_TAG[0x3e] 
  1 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  2 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  3 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  4 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  5 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  6 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  7 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  8 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  9 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 10 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 11 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 12 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 13 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 14 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 15 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 16 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 17 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 18 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 19 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 20 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 21 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 22 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 23 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 24 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 25 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 26 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 27 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 28 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 29 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 30 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 31 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
Pending list: 
 62 SCB_CONTROL[0x40] 
SCB_SCSIID[0x57] SCB_LUN[0x0] 
Kernel Free SCB list: 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 
Untagged Q(5): 62 

>>>>>>>>>>>>>>>>>
sg[0] - Addr 0x55d60000 : Length 8192
sg[1] - Addr 0x56d62000 : Length 16384
sg[2] - Addr 0x56766000 : Length 8192
ahc0:Queuing a BDR SCB
cd0(ahc0:0:5:0): ahc0: no longer in timeout, status = 0
ahc0:A:5: no active SCB for reconnecting target - issuing BUS DEVICE RESET
SAVED_SCSIID == 0x57, SAVED_LUN == 0x0, ARG_1 == 0x3e ACCUM = 0x0
SEQ_FLAGS == 0xc0, SCBPTR == 0x0, BTT == 0x3e, SINDEX == 0x31
SCSIID == 0x0, SCB_SCSIID == 0x57, SCB_LUN == 0x0, SCB_TAG == 0xff, SCB_CONTROL == 0x0
SCSIBUSL == 0x80, SCSISIGI == 0xe6
SXFRCTL0 == 0x88
SEQCTL == 0x10
>>>>>>>>>>>>>>>>>> Dump Card State Begins <<<<<<<<<<<<<<<<<
ahc0: Dumping Card State in Message-in phase, at SEQADDR 0x1ae
Card was paused
ACCUM = 0x0, SINDEX = 0x31, DINDEX = 0xe4, ARG_2 = 0xf
HCNT = 0x0 SCBPTR = 0x0
SCSIPHASE[0x8] SCSISIGI[0xe6] ERROR[0x0] SCSIBUSL[0x80] 
LASTPHASE[0xe0] SCSISEQ[0x12] SBLKCTL[0x6] SCSIRATE[0x0] 
SEQCTL[0x10] SEQ_FLAGS[0xc0] SSTAT0[0x2] SSTAT1[0x11] 
SSTAT2[0x0] SSTAT3[0x0] SIMODE0[0x8] SIMODE1[0xac] 
SXFRCTL0[0x88] DFCNTRL[0x0] DFSTATUS[0x89] 
STACK: 0x147 0x0 0x16b 0x180
SCB count = 64
Kernel NEXTQSCB = 63
Card NEXTQSCB = 63
QINFIFO entries: 
Waiting Queue entries: 
Disconnected Queue entries: 
QOUTFIFO entries: 
Sequencer Free SCB List: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
Sequencer SCB Info: 
  0 SCB_CONTROL[0x0] 
SCB_SCSIID[0x57] SCB_LUN[0x0] SCB_TAG[0xff] 
  1 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  2 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  3 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  4 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  5 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  6 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  7 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  8 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
  9 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 10 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 11 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 12 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 13 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 14 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 15 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 16 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 17 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 18 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 19 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 20 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 21 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 22 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 23 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 24 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 25 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 26 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 27 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 28 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 29 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 30 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
 31 SCB_CONTROL[0x0] 
SCB_SCSIID[0xff] SCB_LUN[0xff] SCB_TAG[0xff] 
Pending list: 
Kernel Free SCB list: 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 

>>>>>>>>>>>>>>>>>
ahc0:A:5:0: Target did not send an IDENTIFY message. LASTPHASE = 0xe0.
ahc0:A:5:0: Protocol violation in Message-in phase.  Attempting to abort.

CPU 0: fatal kernel trap:

CPU 0    trap entry = 0x2 (memory management fault)
CPU 0    a0         = 0x0
CPU 0    a1         = 0x1
CPU 0    a2         = 0x0
CPU 0    pc         = 0xfffffc0000439ec0
CPU 0    ra         = 0xfffffc0000439a60
CPU 0    pv         = 0xfffffc00007e4e60
CPU 0    curlwp    = 0xfffffc00236f8990
CPU 0        pid = 402, comm = syslogd

panic: trap
cpu_Debugger() at netbsd:cpu_Debugger+0x4
panic() at netbsd:panic+0x1d8
trap() at netbsd:trap+0x368
XentMM() at netbsd:XentMM+0x20
--- memory management fault (from ipl 4) ---
ahc_handle_scsiint() at netbsd:ahc_handle_scsiint+0xfe0
ahc_intr() at netbsd:ahc_intr+0x208
alpha_shared_intr_dispatch() at netbsd:alpha_shared_intr_dispatch+0x6c
eb164_iointr() at netbsd:eb164_iointr+0x40
interrupt() at netbsd:interrupt+0x1e4
XentInt() at netbsd:XentInt+0x1c
--- interrupt (from ipl 0) ---
softdep_setup_allocdirect() at netbsd:softdep_setup_allocdirect+0x290
ffs_balloc_ufs1() at netbsd:ffs_balloc_ufs1+0xe38
ffs_balloc() at netbsd:ffs_balloc+0x60
ufs_gop_alloc() at netbsd:ufs_gop_alloc+0xc0
ufs_balloc_range() at netbsd:ufs_balloc_range+0x220
_prop_array_pool() at 0xfffffc00010627f0
--- root of call graph ---
rebooting...

(gdb) list *(0xfffffc0000439ec0)
0xfffffc0000439ec0 is in ahc_handle_scsiint (/mnt/obj/scratch/netbsd-4/sys/dev/ic/aic7xxx.c:1256).
1251                            tag = SCB_LIST_NULL;
1252                            if (ahc_sent_msg(ahc, AHCMSG_1B, MSG_ABORT_TAG, TRUE)
1253                                || ahc_sent_msg(ahc, AHCMSG_1B, MSG_ABORT, TRUE)) {
1254                                    if (ahc->msgout_buf[ahc->msgout_index - 1]
1255                                        == MSG_ABORT_TAG)
1256                                    tag = scb->hscb->tag;
1257                                    ahc_print_path(ahc, scb);
1258                                    printf("SCB %d - Abort%s Completed.\n",
1259                                           scb->hscb->tag, tag == SCB_LIST_NULL ?
1260                                           "" : " Tag");

(gdb) info line *(0xfffffc0000439ec0)
Line 1256 of "/mnt/obj/scratch/netbsd-4/sys/dev/ic/aic7xxx.c"
   starts at address 0xfffffc0000439ec0 <ahc_handle_scsiint+4064>
   and ends at 0xfffffc0000439ec4 <ahc_handle_scsiint+4068>.

disassemble:
0xfffffc0000439ec0 <ahc_handle_scsiint+4064>:   ldq     t0,0(s4)
0xfffffc0000439ec4 <ahc_handle_scsiint+4068>:   ldah    a0,-29(gp)
0xfffffc0000439ec8 <ahc_handle_scsiint+4072>:   ldq     t12,-30520(gp)
0xfffffc0000439ecc <ahc_handle_scsiint+4076>:   lda     a0,-6075(a0)
0xfffffc0000439ed0 <ahc_handle_scsiint+4080>:   ldbu    s0,27(t0)
0xfffffc0000439ed4 <ahc_handle_scsiint+4084>:   lda     a1,52(s2)
0xfffffc0000439ed8 <ahc_handle_scsiint+4088>:
    jsr ra,(t12),0xfffffc000043d9a0 <ahc_suspend+1296>

Relevant pieces of dmesg:

ahc0 at pci0 dev 6 function 0: Adaptec 3960D Ultra160 SCSI adapter
ahc0: interrupting at eb164 irq 0
ahc0: aic7899: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs
scsibus0 at ahc0: 16 targets, 8 luns per target
ahc1 at pci0 dev 6 function 1: Adaptec 3960D Ultra160 SCSI adapter
ahc1: interrupting at eb164 irq 7
ahc1: aic7899: Ultra160 Wide Channel B, SCSI Id=7, 32/253 SCBs
scsibus1 at ahc1: 16 targets, 8 luns per target

cd0 at scsibus0 target 5 lun 0: <YAMAHA, CRW6416S, 1.0c> cdrom removable
cd0: sync (100.00ns offset 15), 8-bit (10.000MB/s) transfers

>How-To-Repeat:
see above.
>Fix:
not known. I have observed problems with the ahc driver, this CD drive and
CDs that it did not like in earlier releases, too.