tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RFC: add MSI/MSI-X support to NetBSD



On Fri, Jun 06, 2014 at 12:40:54PM -0500, David Young wrote:
> On Fri, May 30, 2014 at 05:55:25PM +0900, Kengo NAKAHARA wrote:
> > Hello,
> > 
> > I'm going to add MSI/MSI-X support to NetBSD. I list tasks about this.
> > Would you comment following task list?
> 
> I think that MSI/MSI-X logically separates into a few pieces, what do
> you think about these pieces?
> 
> 1 An MI API for establishing "mailboxes" (or "doorbells" or whatever
>   we may call them).  A mailbox is a special physical address (PA) or
>   PA/data-pair in correspondence with a callback (function, argument).
> 
>   An MI API for mapping the mailbox into various address spaces,
>   but especially the message-signalling devices.  In this way, the
>   mailbox API is a use or an extension of bus_dma(9).
> 
>   Somewhere I have a draft proposal for this MI API, I will try to
>   dig it up.

Here is the proposal that I came up with many months (a few years?) ago
with input from Matt Thomas.  I have tried to account for Matt's
requirements, but I'm not sure that I have done so.

Dave

-- 
David Young
dyoung%pobox.com@localhost    Urbana, IL    (217) 721-9981
BUS_MSI(9)             NetBSD Kernel Developer's Manual             BUS_MSI(9)

bus_msi(9) is a machine-independent interface for establishing in the
machine's physical address space a "doorbell" that when written with
a particular word, sends an interrupt vector to a set of CPUs.  Using
bus_msi(9), the interrupt vector can be tied to interrupt handlers.

bus_msi(9) is the basis for a machine-independent implementation
of PCI Message-Signaled Interrupts (MSI) and MSI-X, however, the
bus_msi(9) implementation itself is highly machine-dependent.  Any
NetBSD architecture that wants to support PCI MSI should provide a
bus_msi(9) implementation.

bus_msi(9) uses facilities provided by bus_dma(9).

typedef struct _bus_msi_t {
        bus_addr_t      mi_addr;
        uint32_t        mi_data;
        uint32_t        mi_count;
};

int
bus_msi_alloc(bus_dma_tag_t tag, bus_msi_reservation_t *msirp, size_t n,
    uint32_t data_min, uint32_t data_max,
    uint32_t data_alignment, uint32_t data_boundary, int flags);

        Reserve `number' interrupt vectors on up to `ncpumax' CPUs
        in the set `cpusetin' and reserve corresponding message
        address/message data pairs.  Record the message address/data-pair
        reservations in up to `nintervals' consecutive bus_msi_interval_ts
        beginning with `interval[0]'; overwrite `rintervals' with
        the number of intervals used.  Overwrite `cpusetout' with
        the set of CPUs where interrupt vectors were established.

        Each bus_msg_interval_t tells a message address, mi_addr,
        and the mi_count different 32-bit message data words,
        [mi_data,�mi_data�+�mi_count�-�1], to write to trigger
        mi_count different interrupt vectors.

        Each message data interval, [mi_data, mi_data + mi_count�-�1]
        will satisfy the constraints passed to bus_msg_alloc():
        [data_min,�data_max] must enclose each interval, each
        interval must start at a multiple of data_alignment, and
        no interval may cross a data_boundary boundary.  A legal
        value of data_alignment (or data_boundary) is either zero
        or a power of 2.  When zero, data_alignment (or data_boundary)
        has no effect.

        `tag' is the bus_dma_tag_t passed by the parent driver via
        the bus _attach_args.

        `flags' may be one of BUS_DMA_WAITOK or BUS_DMA_NOWAIT.

bus_msi_handle_t
bus_msi_establish(bus_dma_tag_t tag, bus_msi_reservation_t msir, int idx,
    const kcpuset_t *cpusetin, int ncpumax, kcpuset_t *cpusetout,
    int ipl, int (*func)(void *), void *arg);

        Establish a callback (func, arg) to run at interrupt priority
        level `ipl' whenever the `idx'th message in `intervals' is
        delivered.  Return an opaque handle for use with
        bus_msi_disestablish().

        You can establish more than one handler at each `idx'.

        The correspondence between `idx's and message-address/data
        pairs is like this:

        idx 0         -> (intervals[0].mi_addr, intervals[0].mi_data)
        idx 1         -> (intervals[0].mi_addr, intervals[0].mi_data + 1)
        . . .
        idx N - 1     -> (intervals[0].mi_addr, intervals[0].mi_data +
                                                intervals[0].mi_count - 1)
        idx N         -> (intervals[1].mi_addr, intervals[1].mi_data)
        idx N + 1     -> (intervals[1].mi_addr, intervals[1].mi_data + 1)
        . . .
        idx N + K - 1 -> (intervals[1].mi_addr, intervals[1].mi_data +
                                                intervals[1].mi_count - 1)

void
bus_msi_disestablish(bus_dma_tag_t tag, bus_msi_handle_t);

        Disestablish the callback established previously with
        bus_msi_handle_t.

void
bus_msi_free(bus_dma_tag_t tag, bus_msi_reservation_t msir, int idx, size_t n);

        Release intervals allocated with bus_msi_alloc().

        bus_msi_free(9) behavior is undefined if callbacks are still
        established on any of the message intervals.

int
bus_msi_extract(bus_dma_tag_t tag, bus_msi_reservation_t msir,
    int idx, size_t n, bus_msi_t *msip, int *rmsi);

        Extract `n' MSI from `msir', starting with the `idx'th,
        and write them to `msip'.  Record how many were extracted
        at `rmsi'.

int
bus_msi_to_segs(bus_dma_tag_t tag, bus_msi_t *msip, size_t n,
    bus_dma_segment_t *segs, int nsegs, int *rsegs);

        Create an array of bus_dma_segment_ts from the message
        addresses in the `n' bus_msi_ts at `msip'.  Record the
        length of the bus_dma_segment_t array at `rsegs'.

int
bus_msi_map(bus_dma_tag_t tag, bus_msi_reservation_t, uint32_t **kvap,
    size_t n);

        Map `n' message addresses into kernel virtual address space,
        recording virtual addresses at `kvap[0..n-1]'.

        [Implementation note: use bus_dmamem_map(9).]

int
bus_msi_unmap(bus_dma_tag_t tag, uint32_t **kvap, size_t n);

        Unmap `n' message addresses, `kvap[0..n-1]'.

        [Implementation note: use bus_dmamem_unmap(9).]

int
bus_msi_trigger(bus_dma_tag_t tag, bus_msi_reservation_t, int idx);

        Post the `idx'th message in `intervals'.  Behavior is
        undefined if a callback has not been established on the
        `idx'th interval using bus_msi_establish(9).

        If a callback was previously established, it may be called
        before bus_msi_trigger(9) has returned or after.

        [Implementation note: use bus_msi_extract(9), bus_msi_map(9),
        *kvap = extracted_interval.mi_data, bus_msi_unmap(9).]

EXAMPLES

        /*
         * Allocate N vectors for MSI on any 1 CPU, return the message
         * address at msiaddrp and the base message data at msidatap.
         */

        int
        msi_allocate(int n, bus_addr_t *msiaddrp, uint32_t *msidatap)
        {
                bus_msg_interval_t intervals;
                int rc, rintervals;

                rc = bus_msg_alloc(tag, n, kcpuset_running, 1, NULL,
                    &intervals, 1, &rintervals, 0, UINT16_MAX, 4, 0,
                    BUS_DMA_WAITOK);

                if (rc != 0)
                        return rc;

                *msiaddrp = intervals.mi_addr;
                *msidatap = intervals.mi_data;
                return 0;
        }

        /*
         * Allocate N vectors for MSI-X on different CPUs in round-robin
         * fashion, return the message-address/data pairs at msiaddrp
         * and msidatap.
         */

        int
        msix_allocate(int n, bus_addr_t *msiaddrp, uint32_t *msidatap)
        {
                kcpuset_t *anykcp, *estkcp;
                bus_msg_interval_t *intervals;
                int i, rc, rintervals;

                intervals = calloc(n, sizeof(*intervals));

                if (intervals == NULL)
                        return ENOMEM;

                if (!kcpuset_create(&estkcp, false)) {
                        free(intervals);
                        return ENOMEM;
                }

                if (!kcpuset_create(&anykcp, true)) {
                        free(intervals);
                        kcpuset_destroy(estkcp);
                        return ENOMEM;
                }

                for (i = 0; i < n; i++) {
                        /* If we've emptied our "any CPUs" set,
                         * refill.
                         */
                        if (kcpuset_iszero(anykcp))
                                kcpuset_copy(anykcp, kcpuset_running);

                        rc = bus_msg_alloc(tag, 1, anykcp, 1, estkcp,
                            &intervals[i], 1, &rintervals, 0, UINT32_MAX, 0, 0,
                            BUS_DMA_WAITOK);

                        if (rc != 0)
                                goto err;

                        /* The CPU where we established the interrupt
                         * is temporarily ineligible.
                         */
                        kcpuset_subtract(anykcp, estkcp);

                        /* Remember 
                        msiaddrp[i] = intervals[i].mi_addr;
                        msidatap[i] = intervals[i].mi_data;
                }

                free(intervals);
                kcpuset_destroy(estkcp);
                kcpuset_destroy(anykcp);
                return 0;
        err:
                while (--i >= 0)
                        bus_msg_free(tag, intervals[i], 1);
                free(intervals);
                kcpuset_destroy(estkcp);
                kcpuset_destroy(anykcp);
                return rc;
        }

        /*
         * Allocate N vectors for MSI-X on any CPUs.
         * Return the message-address/data pairs at msiaddrp
         * and msidatap.
         */

        int
        msix2_allocate(int n, bus_addr_t *msiaddrp, uint32_t *msidatap)
        {
                bus_msg_interval_t *intervals;
                int i, rc, rintervals;

                intervals = calloc(n, sizeof(*intervals));

                if (intervals == NULL)
                        return ENOMEM;

                rc = bus_msg_alloc(tag, n,
                    kcpuset_running, kcpuset_countset(kcpuset_running),
                    NULL,
                    intervals, n, &rintervals, 0, UINT32_MAX,
                    0, /* alignment */
                    1, /* 0 would suffice, but a boundary of 1 prevents
                        * consecutive mi_data from being reserved.
                        *
                        * Perhaps this is too clever.
                        */
                    BUS_DMA_WAITOK);

                if (rc != 0)
                        goto err;

                for (i = 0; i < n; i++) {
                        msiaddrp[i] = intervals[i].mi_addr;
                        msidatap[i] = intervals[i].mi_data;
                }

                free(intervals);
                kcpuset_destroy(estkcp);
                kcpuset_destroy(anykcp);
                return 0;
        err:
                free(intervals);
                kcpuset_destroy(estkcp);
                kcpuset_destroy(anykcp);
                return rc;
        }


Home | Main Index | Thread Index | Old Index