Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: interrupt cleanup #1



Taylor R Campbell <campbell+netbsd-port-xen%mumble.net@localhost> writes:

>> Date: Thu, 28 Jun 2018 17:54:25 +0530
>> From: "Cherry G. Mathew" <cherry%zyx.in@localhost>
>> 
>> As promised, but a little later than I thought - please find a patch
>> below. There are no functional changes, and this is just a first cut.
>
> This is a complicated patch to review, so I tried to reconstruct a
> path of smaller refactorings that are clearly not functional changes
> that lead to it.  However, I got stuck.  I'm attaching two files:
>
> - incremental.mbox: a series of patches in mbox format (from `git
>   format-patch --stdout') that should make no functional changes.
>
>   (Can apply it to src.git with `git am < incremental.mbox'.)
>
> - remainder.patch: a patch from the tree obtained by applying
>   incremental.patch on HEAD, to the tree obtained by applying your
>   original patch on HEAD, about half the size of the original patch.
>
>   (Can it apply it to src.git with `git apply < remainder.patch',
>   after applying incremental.box.)
>

Thank you Taylor - you're right that it is a long patch - the intention
of the single blob was to get people to try it out first so that I could
make sure that I hadn't broken anything , before breaking it down into a
set of smaller changes. 

However, since you've put the effort to break it down, I think I'll
spend some time explaining all your questions, which will also help me
revisit the rationale behind the changes.

> It seems unlikely to me that your patch has no functional changes,
> since, e.g., it uses a new hypercall that was not previously anywhere
> in the tree, EVTCHNOP_bind_vcpu.
>

I think I agree - I introduced this to conform the "addroute()" api to
its semantic in native. #if 0/#endif -ing it should not cause any
problems.

> Maybe this code is not yet reachable so far, but this isn't obvious.
> Maybe these actually have no functional change given the behaviour of
> the hypervisor, but even so, changing the outward-facing interfaces
> that the code invokes stretches the notion of `nonfunctional'.
>
> I got stuck because there is a complicated dance that appears to
> simultaneously:
>
> - add, on top of pins, destinations, irqs, event channels, ports, and
>   vectors, a new concept of `tokens' that isn't immediately clear to
>   me, and change the interpretations of some or of these in light of
>   the new concept of `tokens'; and
>
> - rotate logic between xen_pirq_alloc, xen_pic_addroute/delroute,
>   ioapic_addroute/delroute, isa_intr_establish_xname, and
>   intr_establish_xname.
>

Yes, I'm myself trying to untangle the mess, so these are useful
questions. Let me try below.

> Can you explain:
>
> 1. What are the _old_ definitions of `pin', `destination', `irq',
>    `event channel', `port', and `vector', with reference to whose
>    concepts they are (Intel's? Xen's? ours?), how we use or pun them,
>    and what APIs handle them?  Which of them can hold the same type of
>    object, but merely indicate that it is being used for a different
>    role?

On domU, there is only one handle that refers to a unique "source" of an
interruption of a VCPU. It's called a "port" or an "event channel",
interchangeably. Thus if you see 'port' or 'evtchn', you can more or
less assume they're the same thing. (Within the Xen src code itself,
they seem to try to make a distinction by referring to the evtchn as
some sort of opaque object that contains meta data to facilitate the
existence of a port.)

On dom0, Xen is faced with the problem of mapping real world interrupts
via hardware pins, which may or may not be directed at specific hardware
CPUs to the Xen abstraction of 'port' and 'VCPU'. The way this is
currently assumed to be done by us is as follows. 

i) The OS requests Xen for a physical CPU vector allocation  via a
   "PHYSDEVOP_ASSIGN_VECTOR" hypercall. The input parameter to this call
   is a legacy "irq" between 0 - 255. The idea is that we are asking for
   a mapping between a hardware "irq" number and physical cpu vector.

   An "irq" on x86, is a number between 0-255 that represents an offset
   into the CPU interrupt vector table - sortof an index into a table of
   callbacks. On the first IBM PCs these were mapped 1:1 with the
   physical pins, and numbered 1-15. These are specifically special
   cased in our code. Note that the key takeaway from this description
   is that an irq number is identical to a pin number. Also note that
   with this architecture the mapping between irq numbers and cpu
   vectors could be mapped as two 8 vector sized blocks. Software had no
   control over individual pin->irq mappings, but the irq->vector
   mappings were programmable as two blocks starting at two fixed
   offsets within the (separate) vector table index namespace (0-255).

   When "Advanced" PICS made an appearance, the idea of an irq was
   divorced from the idea of a pin. Now you could have a true cascade of
   PICs, and interrupts were canonically referred to by the (pin, pic)
   tuple rather than 'irq'. Further, the exact physical CPU vector, and
   the destination CPU (in an MP system) could be specified for
   individual interrupts. In other words (pin,pic)->vector mappings were
   possible without the notion of an 'irq' which came to be seen as a
   'legacy' idea.

   However, for backwards compatibility, x86 software still maintains a
   pin->irq mapping for the first 1-15 "irq"s (by programming the
   corresponding I/O Redirection Table entries in the corresponding
   PIC). Specifically, NetBSD maintains this fiction by a bitfield
   mechanism defined in x86/include/i82093var.h

   The (pin,pic)->cpuvector mappings are programmed via
   ioapic.c:ioapic_addroute()/ioapic_delroute()

   We also maintain the notion of a generic "pic" via the data structure
   x86/include/pic.h:struct pic;

   This is poorly abstracted IMO for eg: there is no design for
   cascading PICs (which were a fundamental idea even from the original
   PC XT architecture). The only useful abstractions there really are a
   mechanism to hide callbacks for manipulating "route"s. Even in this
   case, the idea of what constitutes a route is inspired by IOAPICs but
   poorly designed from a semantic perspective. I will come to this
   later when addressing a later question below.

   So in summary, we have "irq"s to represent an offset index into a CPU
   interrupt vector table in the legacy case, and (pin, pic) and a set
   of conventional (pin,pic)->irq mappings and (pin,pic)->irq->vector
   mappings in the ioapic case. We differentiate these poorly via
   bitfield mappings in i82093var.h via a set of macro definitions.


   Enter XEN.

http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/arch/x86/physdev.c;hb=437211cb696515ee5bd5dae0ab72866c9f382a33
"
 392         /* Vector is only used by hypervisor, and dom0 shouldn't
 393            touch it in its world, return irq_op.irq as the vecotr,
 394            and make this hypercall dummy, and also defer the vector 
 395            allocation when dom0 tries to programe ioapic entry. */
 396         irq_op.vector = irq_op.irq;
 397         ret = 0;
"


   What's implicit here is that at some point, they used the entire
   0-255 irq namespace (instead of the restricted 1-16 namespace that
   native uses) to present to domains a "handle" to the fictitious "irq"
   idea. This irq is called a "pseudo-irq" (I'll call it 'pirq' going
   forward) and our Xen implementation freely assumes (correctly or
   incorrectly) the following about this namespace:

   i) The pirq namespace is in the range 0-255 (8bits)
   ii) The pirq block 1-15 coincides with the legacy irq block and is
       thus identically mapped to the underlying XEN legacy irq
       fictional mappings via APIC programming at boot.
   iii) Any pirq from 16-200 can map to cpu vectors that are requestable
        from XEN. We make an assumption that XEN doesn't like requests
        for pirq bindings to cpu vectors in the 200-255 range. 
   iv) We request XEN to allocate a cpu vector for us which is in a
       separate namespace (also 0-255), which we maintain a handle to
       via the pirq number that we gave the requesting
       PHYSDEVOP_ASSIGN_VECTOR hypercall. Note that the consumer of the
       returned cpu vector will be ioapic.c:add_route(...,idtvec,...);
    v) The pirq namespace itself has no inherent semantic other than the
       special case of irq (1-15) whose irq->vector mappings themselves
       are pre-reserved and NOT requestable in the case of APIC support
       being available in the kernel, while in the case of no APIC
       support, they remain fixed, but vector alloc requests are
       available from XEN.
    vi) The cpu vector namespace is used by the dom0 to request:
           vector->(pin,pic) bindings via the PHYSDEVOP_APIC_WRITE
           hypercall in xen/include/i82093var.h:ioapic_write_ul()
    vii) The pirq namespace is further used to allocate an event
         connected to it, and bind them to each other (thus when the pin
         fires, the physical CPU vector callback which is managed by XEN
         activates, and we get a Xen event routed to us). The
         alloc+binding for this is achieved in
         xen/xen/evtchn.c:bind_pirq_to_evtch()
    viii) With APIC support, the pirq namespace can be combined with the
          (pin, pic) tuple, by abusing the bitfield encoding for the
          native "legacy" irq  mechanism originally meant for passing
          around identity mapped irq<->pin conventions + (pin, pic)
          mapping info.

There were a couple of things that I was deeply uncomfortable with from
the above.

a) the 'pirq' namespace itself. Apart from the special case with native,
   it really has nothing to do with irqs. So I decided to name them
   "token" to give them as abstract a meaning as possible. When the need
   arose, I would then explicitly assign the token to a semantically
   meaningful variable name such as "irq" in the case of legacy irqs
   1-15, for eg: (I just noticed that I have used this semantically
   incorrectly in my patch xen_pirq_alloc())

b) The blatant conditional aliasing of pirq and cpu vector namespaces
   (legitimised by the xen source code comment snippet I pasted above)
   while using them in completely separate and unrelated ways:
   - to represent pirq->event bindings
   - to represent (pin,pic)->xenvector

c) Our bitfield use, legitimising the idea that 'pseudo-irq' and legacy
   'irq' are identical namespaces.

   In reality, they are special cased for a narrow range (1-15) while
   being just a handle to xen (which happens to be identity mapped with
   the requested cpu vector) in the case of non 'legacy'.


In response, I decided to do the following:

a) Treat the requesting pirq as an opaque "token"
b) At the point of return of the cpu vector by xen, save it as a
   key-value pair so that it can be looked up when required by the key
   "token".
c) Special case the token as semantically equivalent to a native irq in
   the cases where they were used as such (I felt that a separate
   key-value pair implementation was overkill for this case).

d) Maintain the fiction of " 'pirq' and legacy irq share the same
   namespace" wrt our bitfielding via x86/include/i82093var.h macros.

   Note that this followed from the special case decision in 'c)'
   above.

   I note that there are some inconsistencies in my patch wrt to the
   above.

>
> 2. What are the _new_ definitions of `pin', `destination', `irq',
>    `event channel', `port', `vector', and `token', with reference to
>    whose concepts they are, how we use or pun them, and what APIs
>    handle them?  Which of them can hold the same type of object, but
>    merely indicate that it is being used for a different role?
>

None of these definitions change intrinsically. The only new addition is
the 'token' which can represent a key to lookup an xen allocated
physical cpu 'vector' to us, and in specific cases, a legacy irq in the
range 1-15.


> 3. What does establishing an interrupt handler of each of the various
>    kinds that we care about (Xen event channel, ISA interrupt, &c.)
>    entail from end to end, if the circuitous maze of abstractions is
>    stripped away?  What _should_ it entail from end to end, if the
>    maze of abstractions weren't getting in our way?
>

I introduced the concept of a struct pic; of type XEN_PIC.

The idea here was to wedge in PIC cascading in the struct pic
construct. My idea is that every pic device is represented by a struct
pic; (this is already done in the case of the MSI implementation on
native, for eg:). 

Thus on dom0 we can abstract the interrupting management entitities into
roughly two.
        a) APIC pin->vector route management
        b) The vector->port route management
Note that in this cascading model, the input to the APIC would be a pic,
the output a vector. (The current (*addroute) call has no way to tell us
what the out put is, because cascading is not supported).
The input to the XEN pic would be a vector, and its output a port.

ie; diagrammatically:

h/w pin----->|APIC pic|----->cpu vector---->|XEN pic|---->xen port

Unfortunately, XEN doesn't give us much flexibility in case b). The API
unifies the request for the port allocation with that of the vector
binding. (See bind_pirq_to_evtchn() ). This makes b) a bit redundant.
Further, 'vector' seemed to be an x86+apic specific thing that I thought
should remain within the APIC route management construct in a)

Thus I modified the entities as follows:
     a) APIC pin->vector route management
     b) 1. port->port route management (trivial)
        2. APIC pin->vector route management via "a)" above.

I abused the struct pic; to force it into a cascaded usecase.

Ideally what should have happened if struct pic had cascading support
would have been in intr.c:

a) vector = apic->(*addroute)(ioapic /*src*/,
                              lapic/*dst*/, 
                              pin, vector, type);
b) xenpic->(*addroute)(lapic /*src*/, 
                       xenpic /*dst*/, 
                       vector, port, type);

Instead what happens now via my patch is:

a) xenpic->(*addroute)(apic, /* cascaded */
                       cpu, /* redundant */
                       irq, /* via abused bitfielded
                             * (pin, pic, legacy)
                             * tuple
                             */
                       -1, /* We don't know the port yet */
                       type)
b) inside the xen pic addroute:
   infer the pin, pic from bitfielded irq above.
   apic->(*addroute)(/* redundant */apic, cpu, pin, vector, type)
   do the vector<->port binding and save corresponding values in
                       keyvalue pairs indexed by 'token'.

This compromise would allow XEN code to be resected from the APIC code
and things to be independently cleaned up once the struct pic was
redesigned to be truly cascadable.


> 4. What does EVTCHNOP_bind_vcpu do, and why do we use it?  Can it be
>    done in a separate commit, or is it somehow necessary to use it in
>    order for this commit to have no functional changes?
>

We can take it out - it adds no function other than to enforce the
port->port routing on calling vCPU. I think we can keep it out of this
patch.

> 5. Why is the body of bind_pirq_to_evtch partially copied in
>    xen_pic_addroute?  Likewise unbind_pirq_from_evtch in
>    xen_pic_delroute?
>

bind_pirq_to_evtchn() and friends were created because of the way
pirq->port binding is currently exposed. The way I want to take intr
reorg for xen, these functions should no longer exist. I wanted to make
the first step in this direction by bypassing their usage.

>
> There are some parts of remainder.patch that look suspicious to me.
> For example:
>
>> diff --git a/sys/arch/xen/include/intr.h b/sys/arch/xen/include/intr.h
>> index 9d15e97d4781..1e350bf0e290 100644
>> --- a/sys/arch/xen/include/intr.h
>> +++ b/sys/arch/xen/include/intr.h
>> @@ -72,7 +72,7 @@ struct xen_pic {
>>  	 * Later we want this to be a lookup for MSI etc.
>>  	 * See: intr.c: intr_establish_xname() for usage eg:
>>  	 */
>> -	int irq2port[NR_EVENT_CHANNELS]; /* actually port + 1, so that 0 is invaid */
>> +	int irq2port[256]; /* actually port + 1, so that 0 is invaid */
>>  };
>
> But maybe this is not problematic, if an `irq' is no longer an `event
> channel' but instead something else.
>

This is a fix for a bug that I introduced previously. While the array
values represent the corresponding ports, the *index* is a pirq/token
which is in a 8bit long namespace.

I hope this provides some more clarity. Thanks for these questions.

It would be really valuable for me to have these patches tested on a
native machine in as many dom0 configurations as possible
((i386,pae),amd64).

Many Thanks,

-- 
~cherry


Home | Main Index | Thread Index | Old Index