kern/44418: FAST_IPSEC and if_wm kernel panic - may affect the whole network stack

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/44418: FAST_IPSEC and if_wm kernel panic - may affect the whole network stack
From: Wolfgang.Stukenbrock%nagler-company.com@localhost
Date: Wed, 19 Jan 2011 18:55:01 +0000 (UTC)

>Number:         44418
>Category:       kern
>Synopsis:       FAST_IPSEC and if_wm kernel panic - may affect the whole 
>network stack
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 19 18:55:00 +0000 2011
>Originator:     Dr. W. Stukenbrock
>Release:        NetBSD 5.1 - HEAD also
>Organization:
Dr. Nagler & Company GmbH
>Environment:
        
        
System: NetBSD s0g7 5.1 NetBSD 5.1 (NSW-locationGW) #26: Wed Jan 19 17:05:17 
CET 2011  ncadmin@s0g7:/usr/src/sys/arch/amd64/compile/NSW-locationGW amd64
Architecture: x86_64
Machine: amd64
>Description:
        The system is a supermicro X8SIL (3400 chipset) with a L3406.
        Due to problems with the onboard NIC's, there is a dual port NIC 
(82571EB based) too.
        The test uses only the two 82571EB NIC's. No cable installed on the 
onbaord NIC's.
        I've configured a tunnel with ipcomp and esp - the problem happens on 
outgooing packets.

        spdadd -n 172.25.0.0/16 172.16.0.0/16 any -P out ipsec \
          ipcomp/tunnel/62.153.101.247-62.153.101.241/use \
          esp/tunnel/62.153.101.247-62.153.101.241/require;

        Only the outgooing rule shown. There is a corresponding setup on a 
NetBSD4.0 system at 62.153.101.241.
        This system is 172.25.0.1 and 62.153.101.247.
        Some trafic (ftp put of a large file) is done from 172.25.0.2 to 
172.16.3.1 that goes through the tunnel.

        In general it works fine, but .... -> panic


        Problem description based on the NetBSD source code:

        The function key_checkrequest() in /usr/src/sys/netipsec/key.c removes 
the sav entry from the
        isr structure and a new one is allocated after that. This places NULL 
for a short time in isr->sav.
        This is done regardless the number of packets currently useing this 
isr! (or this SP where the isr is attached)

        In the ipcomp code after compression, the sav in the isr is checked in 
/usr/src/sys/netipsec/xfrom_ipcomp.c
        function ipcomp_output_cb() against a new allocated sav and if they do 
not match an ASSERTION is triggered.
        Some statements later in /usr/src/sys/netipsec/ipsec_output.c function 
ipsec_process_done() there
        are some assertens on isr->sav too and after that the pointer isr->sav 
is referenced.

        Now I've got lots of kernel panics when the isr->sav is referenced, 
because it is a NULL pointer!

        I've tried to figured out the reason, because it was not clear from the 
source.
        I've enabled DIAGNOSTICS and DEBUG - turns on IPSEC_ASSERT.
        Now the ASSERT fails sometimes in ipcomp_output_cb() and sometimes in 
ipsec_process_done() - every time
        with a NULL pointer ...
        But in DDB I'm always finding the correct valid pointer in the isr 
structure ... ?!?!?!

        I've added some print messages with the actual values in front of the 
assertion in case of NULL, and
        find out that the pointer reads NULL and next time the read reads the 
valid pointer again.
        Strange - is another CPU modifying the data, or is it a cache problem 
with the L3406 ????

        I've search in the whole sources where isr->sav gets modified and found 
only one place in key_checkrequest().
        I've changed the way the the isr->sav is updated - mainly allocate new 
one first and do some kind of
        "atomic" update by assignement of the new pointer to avoid NULL in 
isr->sav.
        The NULL problems has gone!
        Hmmmmm .... it is a multiprocessing related problem!

        I've checked the SPL-state (by source code analyses) and it is 
splsoftnet all the time.
        I've checked the splsoftnet() implementation on amd64 and find out that 
it is mapped to the assembler
        stub "splraise" in /usr/src/sys/arch/amd64/amd64/spl.S. That one simply 
changes a value
        in "CPUVAR(ILEVEL)". It does nothing in respect to other CPU's - as far 
as I can see ...
        So if an outgooing packet is processed and key_checkrequest() is 
called, this may be concurrent to the
        call to ipcomp_output_cb() by the crypto-stuff-kernel-thread.
        That could be the reason for my problem ...

        I accedently do not know if the fact, that both CPU's are runnung on 
SPLSOFTNET at the same time, is correct or not.
        The amd64 implementation looks like that this is a valid situation.
        (And I think it would be very slow to contact all other cpu's when 
changing SPL level.)

        If it is allowed for any number of CPU's to run in parallel on 
SPLSOFTNET, then the current implementation
        of FAST_IPSEC is broken!

        The assumption that there is a valid isr->sav pointer in 
ipsec_process_done() is void, because
        it may have changed by key_checkrequest() to a NULL pointer after the 
check for NULL done in
        ipcomp_output_cb() -> panic.
        And this may happen every time if a second packet is forwarded in the 
tunnel just in that moment when the
        ipcomp processing is between the check in ipcomp_output_cb() and the 
access to the pointer in ipsec_process_done().

        There must be a mutex in order to synchronize the access to the 
structures!

        remark: I'm not shure if other parts of the network stack or devices 
are affected too ...
                With the desscribed change to key_checkrequest() above, only 
the NULL panic in ipsec_process_done()
                  are gone.
                I've still additional crashes in if_wm.c when extracting a mbuf 
from the send-queue with NULL ...
                  Seems to be something simular to the NULL crash above, but 
still no time to go deeper into that.
                I've still the problem that sometimes a static added SA with 
setkey disapears (the outgooing ESP-SA in
                  all cases up to now - no time to search that one till now, 
but I think it is related to the broken
                  MP-sync too). No racoon started. Happens too if only the sshd 
and the loging shell is running on
                  the system. (All other processes killed after boot.)

        At the moment it looks like the FAST_IPSEC implementation is not MP 
safe and the whole
        network stack runs only stable, because all device interrupts are 
processed (and serialised) on one CPU only.
        At least the wm-driver seems to get into problems with the send-queue 
if multiple CPU's are gooing to start packets.

        This is the reason why I classified this PR as critical.


        remark:
        This setup runs fine (on other systems of cause with only 2 cores) with 
NetBSD4.0 and I cannot say where
        the main difference in the FAST_IPSEC and or if_wm implementation is.
        Also the change of the splxxxx routines from C to assembler should not 
be the reason for anything.
        At the moment I think it is running "stable" with 4.0 due to the slower 
machine with less cores
        and the "other" kernel-internal thread scheduler. So 4.0 may be also 
affected.

>How-To-Repeat:
        Setup a tunnel with ipcomp and esp - ipcomp alone shoulc be good 
enougth too.
        Use a fast machine with at least 4 Cores/Threads - e.g. Xeon L3406 (2 
Cores, each 2 Threads)
>Fix:
        Still not realy known to me. (sorry)
        For the NULL panics in ipsec_process_done() the suggested workaround 
above for key_checkrequest() seems to help.
        But there should be a mutex to synchronise access to the key structures.

        But the problem seems to be a much more general one !!!!
        The complete multi-CPU synchronisation in FAST_IPSEC needs a review and 
seems to be instable at the moment.
        The whole network stack may be affected too - e.g. when accessing 
interface structrures.

        From my current point of view the whole network stack is affected and 
the MP-synchronisation needs a review.

>Unformatted:

Prev by Date: kern/44417: immediate uvm_fault on boot w/ latest -current
Next by Date: lib/44420: libfetch does not call SSL_free on SSL connections
Previous by Thread: kern/44417: immediate uvm_fault on boot w/ latest -current
Next by Thread: Re: kern/44418: FAST_IPSEC and if_wm kernel panic - may affect the whole network stack
Indexes:

Home | Main Index | Thread Index | Old Index