tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RFC: add MSI/MSI-X support to NetBSD



Hi dyoung,

(2014/06/07 2:56), David Young wrote:
> Here is the proposal that I came up with many months (a few years?) ago
> with input from Matt Thomas. I have tried to account for Matt's
> requirements, but I'm not sure that I have done so.

I have some questions about your bus_msi(9) proposal. Could you answer
the following questions?

Q1: Isn't a simple API similar to current pci(9) good idea?

It seems the proposed bus_msi(9) API is very different from current pci(9)
and bus_dma(9) API. It is good idea to implement clean MSI/MSI-X code,
but I think it may take long time for device drivers to support MSI/MSI-X.
I think using a simple API at the first step allows us to do it faster.
What do you think of my proposal (the end of this mail) that is used for
my prototype?

The remaining questions are not big deal compared to the above.

Q2: Could you tell me an example of device driver's attach function?
I guess bus_msi(9) is used in defice driver's attach function.
But I cannot write down pseudo-code with bus_msi(9) because of
my lack of understanding...
So, could you tell me an example of device driver's attach function?

Q3: What is "bus_msi_interval_t"?
I guess the type will be like this:
typedef struct _bus_msg_interval_t {
     uint32_t mi_addr;
     uint32_t *mi_data; / array of mi_data, the length is mi_count. */
     uint32_t mi_count;
} bus_msg_interval_t;
Is this correct?

Q4: How is bus_msi_map() used?
For bus_msi_trigger() only?
If not, could you tell me how to use? IIUC the content mapped to kvap
(MSI message data) is MD, and manipulating the content is also MD.
So I think we need other MI APIs for it. Is it right?

Q5: How is bus_msi_trigger() used?
I guess the API is used to generate an interrupt by software.
I think it is useful to debug.
Is this correct?


Thanks,


========== begin of pci_msi(9) proposal ==========
PCI_MSI(9) NetBSD Kernel Developer's Manual PCI_MSI(9)

SYNOPSIS
    int
    pci_msi_count(struct pci_attach_args *pa);

    int
    pci_msi_alloc(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int 
*count);

    void
    pci_msi_release(void **cookie, int count);

    void *
    pci_msi_establish(pci_chipset_tag_t pc, pci_intr_handle_t ih, int level, 
int (*func)(void *), void *arg);

    void
    pci_msi_disestablish(pci_chipset_tag_t pc, void *cookie);

    int
    pci_msix_count(struct pci_attach_args *pa);

    int
    pci_msix_alloc(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int 
*count);

    void
    pci_msix_release(void **cookie, int count);

    void *
    pci_msix_establish(pci_chipset_tag_t pc, pci_intr_handle_t ih, int level, 
int (*func)(void *), void *arg);

    void
    pci_msix_disestablish(pci_chipset_tag_t pc, void *cookie);

    int
    pci_msix_remap(pci_intr_handler_t ih, int vector); /* not yet */

    We can use pci_intr_string() as well as pci(9).
    [implementation note: pci_intr_string() is needed to modify for MSI/MSI-X]
    Furthermore we can use bus_dma(9) just like pci(9).


DATA TYPES
    same as pci(9).


FUNCTIONS
    int
    pci_msi_count(struct pci_attach_args *pa);
        return device's MSI vector count.
        return value:
            MSI vector count which `pa' device support. If the device does not
            support MSI, return 0. If error, return 0.
        pa:
            parameter of device_attach();

    int
    pci_msi_alloc(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int 
*count);
        Allocate MSI resources and register MSI. This fuction is used like 
pci_intr_map() of pci(9).
        return value:
            If success, return 0. If it cannot allocate resources,
            return non-zero. If error, return non-zero.
        pa:
            Parameter of device_attach().
        ihps:
            Array of interrupt handlers.
            [implementation note: The interrupt handler consist following]
                a) the "virtual IRQ number" (like IRQ number for normal
                   interrupts) used in FreeBSD
                b) MPSAFE flag
                c) MSI flag (new)
        count:
            MSI vector count which the device driver want. `count'  must be
            power of 2. And `count' may change, if OS cannot allocate input
            `count' but can allocate less than input `count'.
            For example, input `count' is 4, at return `count' may change 2
            or 1.

    void
    pci_msi_release(void **cookie, int count);
        release MSI resources and unregister MSI.
        cookie:
            Interrupt handler, which is first fo `ihps'.
        count:
            allocated MSI vector count.

    void *
    pci_msi_establish(pci_chipset_tag_t pc, pci_intr_handle_t ih, int level, 
int (*func)(void *), void *arg);
        register interrupt handling function, and establish each MSI.
        parameters are the same as pci_intr_establish().
        If the device driver use multi MSI, the device dirver must use
        pci_msi_establish() for each MSI.
        When pci_msi_establish() completed, MSIs are assigned CPU round-robin.
        The way of round-robin is system-wide. For example, if the system has
        4 cpus (cpuid is [0-3])
            device1 MSI1: cpu1
            device1 MSI2: cpu2
            device2 MSI1: cpu3
            device2 MSI2: cpu0
            device2 MSI3: cpu1
            device2 MSI4: cpu2
        [TODO: implement interrupt affinity (aka interrupt routing)]

    void
    pci_msi_disestablish(pci_chipset_tag_t pc, void *cookie);
        unregister interrupt handling function, and disestablish MSI.
        parameters are the same as pci_intr_disestablish().

    int
    pci_msix_count(struct pci_attach_args *pa);
        Almost same as pci_msi_count().
        The only defference is return MSI-X vector count.

    int
    pci_msix_alloc(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int 
*count);
        Almost same as pci_msix_alloc(). The deferences are below two,
        (1) Allocate MSI-X resources
        (2) `count' can be changed not only power of 2 but also any number less 
than input `count'.

    void
    pci_msix_release(void **cookie, int count);
        Same as pci_msix_release().
        [implement note: call pci_msi_release() only]

    void *
    pci_msix_establish(pci_chipset_tag_t pc, pci_intr_handle_t ih, int level, 
int (*func)(void *), void *arg);
        Almost same ame as pci_msi_establish(). The only defference is
        establishing as MSI-X.

    void
    pci_msix_disestablish(pci_chipset_tag_t pc, void *cookie);
        Almost same ame as pci_msi_establish(). The only defference is
        disestablishing as MSI-X.

    int
    pci_msix_remap(pci_intr_handler_t ih, int vector); /* not yet */
        remap MSI-X table entry.
        [TODO: impement this function.]


EXAMPLE:
    Example of NIC multi queue by using MSI-X. If it cannot use MSI-X,
    it try to MSI. If it cannot use MSI either, it try to use legacy interrupt.

#define RX_QUEUE_NUM 5 /* or use online cpus */
#define TX_QUEUE_NUM 5 /* or use online cpus */
#define RX_INTR_FIRST_INDEX 0
#define TX_INTR_FIRST_INDEX RX_QUEUE_NUM
#define EV_INTR_INDEX (RX_QUEUE_NUM+TX_QUEUE_NUM)

static void
foo_device_attach(device_t parent, device_t self, void *aux)
{
    struct foo_softc *sc = device_private(self);
    struct pci_attach_args *pa = aux;
    /* last "1" is need for events like changing link state */
    int want_msix_count = RX_QUEUE_NUM + TX_QUEUE_NUM + 1;
    int want_msi_count = 1; /* MSI cannot manipulate multi queue well */
    int count = want_msix_count;
    pci_intr_handle_t *ihs;
    pci_intr_handle_t ih;

    if (pci_msix_count() < want_msix_count) {
        goto use_msi;
    }
    error = pci_msix_alloc(pa, &ihs, &count);
    if (error || count != want_msix_count) {
        goto use_msi;
    }
    /* RX */
    for (i = RX_INTR_FIRST_INDEX; i < RX_INTR_FIRST_INDEX + RX_QUEUE_NUM; i++) {
        sc->sc_ih[i] = pci_msix_establish(pa->pa_pc, rx_ihs[i], IPL_NET, 
foo_rx_intr,  sc);
        if (sc->sc_ih[i] == NULL)
            goto error;
    }
    /* TX */
    for (i = TX_INTR_FIRST_INDEX; i < TX_INTR_FIRST_INDEX + TX_QUEUE_NUM; i++) {
        sc->sc_ih[i] = pci_msix_establish(pa->pa_pc, tx_ihs[i], IPL_NET, 
foo_tx_intr,  sc);
        if (sc->sc_ih[i] == NULL)
            goto error;
    }
    /* event */
    sc->sc_ih[EV_INTR_INDEX] = pci_msix_establish(pa->pa_pc, 
tx_ihs[EV_INTR_INDEX], IPL_NET, foo_event_intr,  sc);
    if (sc->sc_ih[EV_INTR_INDEX] == NULL) {
        goto error;
    }
    sc->sc_intr_type = INTR_TYPE_MSIX;
    goto done_establish;

use_msi:
    if (pci_msi_count() < want_msi_count) {
        goto use_legacy;
    }
    error = pci_msi_alloc(pa, &ihs, &count);
    if (error || count != want_msi_count) {
        goto use_legacy;
    }
    sc->sc_ih[0] = pci_msix_establish(pa->pa_pc, ihs[0], IPL_NET, 
foo_rx_tx_ev_intr,  sc);
    if (sc->sc_ih[0] == NULL) {
        goto error;
    }
    sc->sc_intr_type = INTR_TYPE_MSI;
    goto done_establish;

use_legacy:
    error = pci_intr_map(pa &ih);
    if (error) {
        goto error;
    }
    sc->sc_ih[0] = pci_msix_establish(pa->pa_pc, ih, IPL_NET, 
foo_rx_tx_ev_intr,  sc);
    if (sc->sc_ih[0]) == NULL) {
        goto error;
    }
    sc->sc_intr_type = INTR_TYPE_LEGACY;
    goto done_establish;
done_establish:
    /*
     * bus_dma(9) setting, etc ...
     */

error:
    /*
     * error handling
     */
}
========== end of pci_msi(9) proposal ==========

-- 
//////////////////////////////////////////////////////////////////////
Internet Initiative Japan Inc.

Device Engineering Section,
Core Product Development Department,
Product Division,
Technology Unit

Kengo NAKAHARA <k-nakahara%iij.ad.jp@localhost>



Home | Main Index | Thread Index | Old Index