NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/46896: iSCSI initiator ccb_pool gets corrupted

>Number:         46896
>Category:       kern
>Synopsis:       iSCSI initiator ccb_pool gets corrupted
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Sep 03 20:40:00 +0000 2012
>Originator:     Michael L. Hitch
>Release:        NetBSD 6.0_RC1 as of 19-Aug-2012
        Montana State University
System: NetBSD 6.0_RC1 NetBSD 6.0_RC1 (XEN3_DOM0) #43: Sun 
Sep 2 20:19:33 MDT 2012
Architecture: x86_64
Machine: amd64
        After updating to 6.0_RC1, I started a XEN DOMU kernel using an iSCSI
        disk.  I'm fairly certain that I had been able to run this for some time
        previously (netbsd-6 tree as of 24-May).  Shortly after starting the 
        kernel, the iSCSI initiator started reporting no ccbs:

        Aug 30 00:20:11 net5 /netbsd: S2C1: No CCB in run_xfer
        Aug 30 00:20:11 net5 /netbsd: sd1(iscsi0:0:0:0): adapter resource 
        Aug 30 00:20:12 net5 /netbsd: S2C1: No CCB in run_xfer
        Aug 30 00:20:12 net5 /netbsd: sd1(iscsi0:0:0:0): adapter resource 

        I'm running a  6.0_RC1 XEN3_DOM0 kernel (with the iscsi initiator added
        to the kernel config, since xen kernels won't load modules), and an i386
        XEN3 DOMU running cacti (lots and lots of disk updates).

        After writing a quick kernel groveler to extract information from the
        various iSCSI initiator tables, I found that indeed, the ccb_pool
        head for the session showed it was empty.  Dumping out the contents of
        all the ccbs seemed to indicate they were all free, just no longer on 
        free list.

        Session 0xffffa00002945000: id=2
        ccb_pool 0x0000000000000000:0xffffa0000294c588 ccb_throttled 
        ccb[ 0]  0xffffa00002945208 next 0xffffa0000294d3f8 status 0 disp 0 ITT 
        ccb[55]  0xffffa0000294c378 next 0xffffa0000294c168 status 0 disp 0 ITT 
        ccb[56]  0xffffa0000294c588 next 0x0000000000000000 status 0 disp 0 ITT 
        ccb[57]  0xffffa0000294c798 next 0xffffa0000294c588 status 0 disp 0 ITT 

        I was not able to see anything obvious in changes to sys/dev/iscsi 
        that might have caused this.  I then added the ccbs_waiting queue 
        and noted that when this condition occurs, the tail entry of the header
        pointed to the ccb_pool - certainly not correct.

        This leads me to suspect that removing ccbs from ccbs_waiting and
        adding them to the free pool has some trouble.  From looking at the
        code, it looks to me like a ccb on the ccb_waiting queue is passed to
        wake_ccb(), which removes it from the ccb_waiting queue.  However, there
        appears to be no protection of something else from getting the same ccb
        on the ccbs_waiting queue and calling wake_ccb().  The first caller 
        removing the ccb from ccbs_waiting and adding it to ccb_pool.  The 
        caller now tries to remove the same ccb from ccbs_waiting and adding it
        to ccb_pool with nasty results.  I'm now working on seeing if this is
        indeed the case (adding some debug code to check and print information
        if it detects this occuring).

        I suspect this problem is relatively rare, and needs something similar
        to my above described setup to get enough random activity with the iSCSI
        target to duplicate.
        If the problem is multiple processing of a ccb on the ccbs_waiting 
        try to prevent that from happening, or at least prevent it from 
        the ccb_pool and ccbs_waiting queues.

Home | Main Index | Thread Index | Old Index