Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: ixg tester needed (was Re: Problems with netbsd-8 RC1 and ixg drivers (?))



On 2018/06/03 18:20, 6bone%6bone.informatik.uni-leipzig.de@localhost wrote:
Hello,

I have applied

     http://www.netbsd.org/~msaitoh/ixgbe-eitr-20180522-0.dif
and
     http://www.netbsd.org/~msaitoh/ixgbe-norearm-20180530-0.dif

to netbsd-8 RC1. With these patches the problem seems to be solved.

 Thanks. I've committed the latest patch now!



Thank you for your efforts

Regards
Uwe


On Fri, 1 Jun 2018, Masanobu SAITOH wrote:

Date: Fri, 1 Jun 2018 12:47:32 +0900
From: Masanobu SAITOH <msaitoh%execsw.org@localhost>
To: 6bone%6bone.informatik.uni-leipzig.de@localhost
Cc: msaitoh%execsw.org@localhost, Martin Husemann <martin%duskware.de@localhost>,
    current-users%netbsd.org@localhost
Subject: Re: ixg tester needed (was Re: Problems with netbsd-8 RC1 and ixg
    drivers (?))



  The same diff is at:

     http://www.netbsd.org/~msaitoh/ixgbe-norearm-20180530-0.dif


Updated patch (Fix compile error and ixv patch):

------------------------------------------
Don't call ixgbe_rearm_queues() in ixgbe_local_timer1(). ixgbe_enable_queue()
and ixgbe_disable_queue() try to enable/disable queue interrupt safely. It
has the internal counter. When a queue's MSI-X is received, ixgbe_msix_que()
is called (IPL_NET). This function disable the queue's interrupt by
ixgbe_disable_queue() and issues an softint. ixgbe_handle() queue is called by
the softint (IPL_SOFTNET), process TX,RX and call ixgbe_enable_queue() at the
end.

ixgbe_local_timer1() is a callout and run always on CPU 0 (IPL_SOFTCLOCK).
When ixgbe_rearm_queues() called, an MSI-X interrupt is issued for a specific
queue. It may not CPU 0. If this interrupt's ixgbe_msix_que() is called
and sofint_schedule() is called before the last sofint's softint_execute()
is not called, the softint_schedule() fails because of SOFTINT_PENDING.
It result in breaking ixgbe_{enable,disable}_queue()'s internal counter.

ixgbe_local_timer1() is written not to call ixgbe_rearm_queues() if
the interrupt is disabled, but it's called because of unknown bug or a race.

One solution is to not to use the internal counter, but it's little difficult.
Another solution is stop using ixgbe_rearm_queues() at all.  Essentially,
ixgbe_rearm_queues() is not required (it was added in ixgbe.c rev. 1.43
(2016/12/01)). ixgbe_rearm_queues() helps for lost interrupt problem but
I've never seen it other than ixgbe_rearm_queues() problem.


Index: ixgbe.c
===================================================================
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixgbe.c,v
retrieving revision 1.158
diff -u -p -r1.158 ixgbe.c
--- ixgbe.c    30 May 2018 09:17:17 -0000    1.158
+++ ixgbe.c    1 Jun 2018 03:22:05 -0000
@@ -4411,6 +4411,7 @@ ixgbe_local_timer1(void *arg)
    /* Only truely watchdog if all queues show hung */
    if (hung == adapter->num_queues)
        goto watchdog;
+#if 0 /* XXX Avoid unexpectedly disabling interrupt forever (PR#53294) */
    else if (queues != 0) { /* Force an IRQ on queues with work */
        que = adapter->queues;
        for (i = 0; i < adapter->num_queues; i++, que++) {
@@ -4421,6 +4422,7 @@ ixgbe_local_timer1(void *arg)
            mutex_exit(&que->dc_mtx);
        }
    }
+#endif
 out:
    callout_reset(&adapter->timer, hz, ixgbe_local_timer, adapter);
@@ -6643,7 +6645,7 @@ ixgbe_handle_link(void *context)
/************************************************************************
 * ixgbe_rearm_queues
 ************************************************************************/
-static void
+static __inline void
ixgbe_rearm_queues(struct adapter *adapter, u64 queues)
{
    u32 mask;
Index: ixv.c
===================================================================
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixv.c,v
retrieving revision 1.102
diff -u -p -r1.102 ixv.c
--- ixv.c    30 May 2018 08:35:26 -0000    1.102
+++ ixv.c    1 Jun 2018 03:22:05 -0000
@@ -1266,9 +1266,11 @@ ixv_local_timer_locked(void *arg)
    /* Only truly watchdog if all queues show hung */
    if (hung == adapter->num_queues)
        goto watchdog;
+#if 0
    else if (queues != 0) { /* Force an IRQ on queues with work */
        ixv_rearm_queues(adapter, queues);
    }
+#endif
     callout_reset(&adapter->timer, hz, ixv_local_timer, adapter);
------------------------------------------

The same diff is at:

    http://www.netbsd.org/~msaitoh/ixgbe-norearm-20180531-0.dif

--
-----------------------------------------------
               SAITOH Masanobu (msaitoh%execsw.org@localhost
                                msaitoh%netbsd.org@localhost)



--
-----------------------------------------------
                SAITOH Masanobu (msaitoh%execsw.org@localhost
                                 msaitoh%netbsd.org@localhost)


Home | Main Index | Thread Index | Old Index