tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

RFC: MSI/MSI-X implementation



Hi,

I implement MSI/MSI-X support based on my IRQ affinity code.
# here is the mail about IRQ affinity code
#     http://mail-index.netbsd.org/tech-kern/2014/09/12/msg017653.html

Here is the implementation,
    https://github.com/knakahara/netbsd-src/tree/rfc/msi-msix

and hereis the patches,
    (1) http://knakahara.github.io/patches/netbsd/msi-msix-support-01-irq-affinity.patch
        IRQ affinity code with some bug fixes
    (2) http://knakahara.github.io/patches/netbsd/msi-msix-support-02-main.patch
        main MSI/MSI-X support code
    (3) http://knakahara.github.io/patches/netbsd/msi-msix-support-03-fix-build-failure.patch
        tiny patch to fix build failure

Using these APIs, if_vmx can use multiqueue like below
# hikaru@n.o implements if_vmx multiqueue code, thanks
#     https://github.com/knakahara/netbsd-src/tree/k-nakahara-msi-msix-proto2-test-vmx
====================
# intrctl list
 interrupt name CPU#00(+)       CPU#01(+)
 ioapic0 pin 9         0*              0        unknown
 ioapic0 pin 1         0*              0        unknown
 ioapic0 pin 12        0*              0        unknown
 ioapic0 pin 14        0*              0        unknown
 ioapic0 pin 15        6*              0        unknown
 ioapic0 pin 17    82321*              0        unknown
 ioapic0 pin 16       17*              0        unknown
 msix0 vec 0       11935*              0        vmx0: tx 0
 msix0 vec 1           0*              0        vmx0: tx 1 (*1)
 msix0 vec 2       14895*              0        vmx0: rx 0
 msix0 vec 3        1904*              0        vmx0: rx 1
 msix0 vec 4           0*              0        vmx0: link
 ioapic0 pin 19        0*              0        unknown
 ioapic0 pin 7         0*              0        unknown
 ioapic0 pin 4         0*              0        unknown
 ioapic0 pin 3         0*              0        unknown
 ioapic0 pin 6         0*              0        unknown
====================
(*1) This if_vmx implementation use multiqueue for only receive side.
     if_vmx creates and establishes "tx 1", but does not use it.

Of course, MSI/MSI-X can affinity like normal interrupts
====================
# sh intrctl affinity -i 'msix0 vec 2' -c 1
# sh intrctl affinity -i 'msix0 vec 3' -c 1
(send and receive some files)

# intrctl list
 interrupt name CPU#00(+)       CPU#01(+)
 ioapic0 pin 9         0*              0        unknown
 ioapic0 pin 1         0*              0        unknown
 ioapic0 pin 12        0*              0        unknown
 ioapic0 pin 14        0*              0        unknown
 ioapic0 pin 15        6*              0        unknown
 ioapic0 pin 17    82668*              0        unknown
 ioapic0 pin 16       49*              0        unknown
 msix0 vec 0       14772*              0        vmx0: tx 0
 msix0 vec 1           0*              0        vmx0: tx 1
 msix0 vec 2       15024            2010*       vmx0: rx 0
 msix0 vec 3        1905            1089*       vmx0: rx 1
 msix0 vec 4           0*              0        vmx0: link
 ioapic0 pin 19        0*              0        unknown
 ioapic0 pin 7         0*              0        unknown
 ioapic0 pin 4         0*              0        unknown
 ioapic0 pin 3         0*              0        unknown
 ioapic0 pin 6         0*              0        unknown
====================

Furthermore, I write a simple (but not brief) manual addition to
pci_intr(9). I show the manual in the end of this mail.

Could you comment the specification and implementation?

Thanks,

========== MSI/MSI-X API manual ==========
PCI_INTR(9)


SYNOPSIS
/* existing */
    int
    pci_intr_map(const struct pci_attach_args *pa, pci_intr_handle_t *ih);


    const char *
    pci_intr_string(pci_chipset_t *pc, pci_intr_handle_t ih, char *buf,
        size_t len);


    void *
    pci_intr_establish(pci_chipset_t *pc, pci_intr_handle_t ih, int ipl,
        int (*intrhand)(void *), void *intrarg);


    void
    pci_intr_disestablish(pci_chipset_t *pc, void *ih);


/******************************************************************************/
/* new APIs for normal interrupt */
    int
    pci_intr_alloc(const struct pci_attach_args *pa,
        pci_intr_handle_t **pih);


    void
    pci_intr_release(pci_intr_handle_t *pih);


/******************************************************************************/
/* new APIs for MSI */
    int
    pci_msi_count(struct pci_attach_args *pa);


    int
    pci_msi_alloc(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int *count);


    int
    pci_msi_alloc_exact(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int count);


    void
    pci_msi_release(pci_intr_handle_t **pihs, int count);

    void *
    pci_msi_establish(pci_chipset_tag_t pc, pci_intr_handle_t ih,
        int level, int (*func)(void *), void *arg);


    void *
    pci_msi_establish_xname(pci_chipset_tag_t pc, pci_intr_handle_t ih,
        int level, int (*func)(void *), void *arg, const char *xname);

    void
    pci_msi_disestablish(pci_chipset_tag_t pc, void *cookie);


/******************************************************************************/
/* new APIs for MSI-X */
    int
    pci_msix_count(struct pci_attach_args *pa);

    int
    pci_msix_alloc(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int *count);

    int
    pci_msix_alloc_exact(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int count);


    void
    pci_msix_release(pci_intr_handle_t **pihs, int count);


    void *
    pci_msix_establish(pci_chipset_tag_t pc, pci_intr_handle_t ih,
        int level, int (*func)(void *), void *arg, const char *xname);

    void *
    pci_msix_establish_xname(pci_chipset_tag_t pc, pci_intr_handle_t ih,
        int level, int (*func)(void *), void *arg, const char *xname);

    void
    pci_msix_disestablish(pci_chipset_tag_t pc, void *cookie);

    int
    pci_msix_remap(pci_intr_handle_t *pihs, int count);


/******************************************************************************/
/* for all interrupt */
    void
    pci_any_intr_disestablish(pci_chipset_tag_t, void *);

    void
    pci_any_intr_release(pci_intr_handle_t **, int);


DESCRIPTION FOR MSI/MSI-X
    The pci_msi and pci_msix functions exist to allow device drivers machine-
    independet access to PCI MSI/MSI-X. The functions described in this page
    are typically declared in a port's <machine/pci_machdep.h> header file;
    however, drivers should generally include <dev/pci/pcivar.h> to get other
    PCI-specific declarations as well.


    If a driver wishes to establish an MSI/MSI-X handler for the device, it
    should pass the struct pci_attach_args * to the pci_msi{,x}_alloc() or
    pci_msi{,x}_alloc_exact() function, which returns zero on success,
    and nonzero on failure. The function allocates pci_intr_handler_t * array
    anad sets each pci_intr_handler_t pointed at by its second argument to
    a machine-dependent value which identifies a particular MSI/MSI-X vector.


    If the driver wishes to refer to the interrupt source in an attach or
    error message, it should use the value returned by pci_intr_string()
    too. This function can use normal interrupt and MSI/MSI-X.


    Subsequently, when the driver is prepared to receive interrupts, it
    should call pci_msi{,x}_establish() to actually establish the handler;
    when the MSI/MSI-X vector interrupts, intrhand will be called with
    a single argument intrarg, and will run at the interrupt priority
    level ipl.


    The return value of pci_msi{,x}_establish() may be saved and passed to
    pci_msi{,x}_disestablish() to disable the interrupt handler when the
    driver is no longer interested in interrupts from the device.


    The device drivers must call pci_msi{,x}_release() to release
    resources after pci_msi{,x}_disestablish().


    In addition, if device drivers want to treat normal interrupt and
    MSI/MSI-X, device drivers should use pci_intr_alloc()/
    pci_intr_release()instead of pci_intr_map(). The function
    allocates pci_intr_handle_t as well as pci_msi{,x}_alloc() does.
    Using pci_intr_alloc(), device drivers can use
    pci_any_intr_disestablish() and pci_any_intr_release().


    Of cause, device drivers wich don't use MSI/MSI-X can use
    pci_intr_map() as used to be.


FUNCTION
    int
    pci_intr_alloc(const struct pci_attach_args *pa,
        pci_intr_handle_t **pih);
        "pa" is pci_attach_args passed from device driver's attach function.
        "pih" is pointer to pci_intr_handle_t *.
        pci_intr_handle_t is allocated in pci_intr_alloc(), so device
        drivers must call pci_intr_relase() or pci_any_intr_release().

    void
    pci_intr_release(pci_intr_handle_t *pih)
        pih is pointer to pci_intr_handle_t to release resources.

/******************************************************************************/
/* for MSI */
    int
    pci_msi_count(struct pci_attach_args *pa);
        return max number of MSI vectors which supported by device.
        In other words, return hardware limit of MSI vectors.
        If the device does not support MSI, returns zero.

    int 
    pci_msi_alloc(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int *count);
         This function allocates and sets pci_intr_handle_t.
         "ihps" is pointer to the array of pci_intr_handle_t allocated by
         this function. "count" is vector number wanted by device drivers.
         Therefore, if there is no enogh resources, "count" may be
         decremented at return time. This function returns zero on success,
         and returns non-zero on failure.
         Due to PCI supecification, "count" must be power of 2. Even if
         "count" is decremented, it must stay within the constraint.

    int
    pci_msi_alloc_exact(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int count);
        This function is similar to pci_msi_alloc(), the only difference is
        "count" is never decremented.

    void
    pci_msi_release(pci_intr_handle_t **pihs, int count);
        "pih" is pointer to array of pci_intr_handle_t to release resources.
        "count" is number of allocated handlers.

    void *
    pci_msi_establish(pci_chipset_tag_t pc, pci_intr_handle_t ih,
        int level, int (*func)(void *), void *arg);
        This function is similar to pci_intr_establish(). Return value and
        arguments are the same as pci_intr_establish(). The only difference
        is "ih" must be MSI handler. If "ih" is normal interrupt handler,
        this function fails.

    void *
    pci_msi_establish_xname(pci_chipset_tag_t pc, pci_intr_handle_t ih,
        int level, int (*func)(void *), void *arg, const char *xname);
        This function is similar to pci_msi_establish(). The only difference
        is use "xname" as MSI vector name.

    void
    pci_msi_disestablish(pci_chipset_tag_t pc, void *cookie);
        This function is similar to pci_intr_disestablish(). Return value
        and arguments are the same as pci_intr_establish(). The only
        difference is "ih" must be MSI handler. If "ih" is normal interrupt
        handler, this function fails.


/******************************************************************************/
/* for MSI-X */
    int
    pci_msix_count(struct pci_attach_args *pa);
        This function is similar to pci_msi_count(). The only difference is
        returns max number of MSI-X vectors.


    int
    pci_msix_alloc(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int *count);
        This function is similar to pci_msi_alloc(). The differences is two:
            - allocate handler for MSI-X
            - "count" can be any number more than zero


    int
    pci_msix_alloc_exact(struct pci_attach_args *pa, pci_intr_handle_t **ihps, int count);
        This function is similar to pci_msi_alloc_exact(). The differences
        is two:
            - allocate handler for MSI-X
            - "count" can be any number more than zero


    void
    pci_msix_release(pci_intr_handle_t **pihs, int count);
        This function is wrapper function to pci_msi_release().


    void *
    pci_msix_establish(pci_chipset_tag_t pc, pci_intr_handle_t ih,
        int level, int (*func)(void *), void *arg, const char *xname);
        This function is similar to pci_msi_establish(). The only difference
        is "ih" must be MSI-X handler.
        This function use devices' MSI-X vector table continuously in order
        from 0. If device drivers want to use MSI-X vector table
        non-continuously, drivers should use pci_msix_remap().


    void *
    pci_msix_establish_xname(pci_chipset_tag_t pc, pci_intr_handle_t ih,
        int level, int (*func)(void *), void *arg, const char *xname);
        This function is similar to pci_msi_establish_xname(). The only
        difference is "ih" must be MSI-X handler.


    void
    pci_msix_disestablish(pci_chipset_tag_t pc, void *cookie);
        This function is similar to pci_msi_disestablish(). The only
        difference is "ih" must be MSI-X handler.


    int
    pci_msix_remap(pci_intr_handle_t *pihs, int count);
        This function remap MSI-X vector table entries.
        "pihs" is array of pci_intr_handle_t for MSI-X. "count" is
        total number of table entries after remapped. This function
        returns zero on success, and return non-zero on failure
        without changing MSI-X vector table.
        For example, if device drivers want to remap above,
                   before                            after
        | index | conbined handler |      | index | conbined handler |
        +-------+------------------+      +-------+------------------+
        |     0 |          pihs[0] |      |     0 |          pihs[3] |
        +-------+------------------+      +-------+------------------+
        |     1 |          pihs[1] |  ->  |     1 |       (not used) |
        +-------+------------------+      +-------+------------------+
        |     2 |          pihs[2] |      |     2 |       (not used) |
        +-------+------------------+      +-------+------------------+
        |     3 |          pihs[3] |      |     3 |       (not used) |
        +-------+------------------+      +-------+------------------+
                                          |     4 |          pihs[0] |
                                          +-------+------------------+
                                          |     5 |       (not used) |
                                          +-------+------------------+
                                          |     6 |          pihs[1] |
                                          +-------+------------------+
                                              // pihs[2] is disestablished

        the device driver should use this function like this.
        ====================
            pci_intr_handle_t after[7];
            after[0] = before[3];
            after[1] = MSI_INT_MSIX_INVALID; // not using mark
            after[2] = MSI_INT_MSIX_INVALID;
            after[3] = MSI_INT_MSIX_INVALID;
            after[4] = before[0];
            after[5] = MSI_INT_MSIX_INVALID;
            after[6] = before[1];
            ret = pci_msix_remap(after, 7);
            if (ret != 0)
                // error handling
            else
                pci_msix_disestablish(before[2]);
        ====================
========== MSI/MSI-X API manual ==========

-- 
//////////////////////////////////////////////////////////////////////
Internet Initiative Japan Inc.

Device Engineering Section,
Core Product Development Department,
Product Division,
Technology Unit

Kengo NAKAHARA <k-nakahara%iij.ad.jp@localhost>


Home | Main Index | Thread Index | Old Index