tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: RFC: PERCPU_LIST to fix PR kern/52515



On Fri, Oct 6, 2017 at 4:24 PM, Ryota Ozaki <ozaki-r%netbsd.org@localhost> wrote:
> On Fri, Oct 6, 2017 at 1:14 PM, Ryota Ozaki <ozaki-r%netbsd.org@localhost> wrote:
>> On Fri, Oct 6, 2017 at 11:58 AM, Taylor R Campbell
>> <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
>>>> Date: Fri, 6 Oct 2017 11:26:40 +0900
>>>> From: Ryota Ozaki <ozaki-r%netbsd.org@localhost>
>>>>
>>>> On Mon, Oct 2, 2017 at 11:13 PM, Taylor R Campbell
>>>> <campbell+netbsd-tech-kern%mumble.net@localhost> wrote:
>>>> > Quick summary of the problem:
>>>> >
>>>> > Possible solutions.  I'm leaning toward (6), to open-code the linked
>>>> > list operations for this special purpose, with compile-tested patch
>>>> > attached.  This changes the text of psref.h, but shouldn't change the
>>>> > ABI.  Comments?
>>>>
>>>> How about using SLIST instead of open-coding? The instructions of them
>>>> are very similar, but the SLIST version is slightly simpler.
>>>
>>> I avoided that because n psref_release operations takes expected and
>>> worst-case O(n^2) time and there's no constant bound on the latency of
>>> a single psref_release operation.  But maybe n is always small enough
>>> that it doesn't matter -- perhaps enough that the concrete cost of
>>> maintaining a doubly-linked list is higher.
>>
>> I also suppose that a target being released is at the head of the list
>> because targets of psref are typically manipulated in last-in-first-out
>> manner. But yes psref_release of SLIST version can be O(n) in worst
>> cases theoretically.
>>
>> Well, when I think this kind of problems I tend to think of our usages,
>> i.e., network processing as a router. In such usages, psref is used in
>> softint and the last-in-first-out manner would keep in many cases. But
>> in other usages such as uses in threads the assumption is perhaps not
>> really.
>>
>>>
>>> (My desire to avoid thinking about bounds on n is also what motivated
>>> me to use a linked list instead of an array in the first place.)
>>>
>>> Note that your patch changes the ABI of struct psref!
>>
>> Let's bump the kernel version!
>>
>>>
>>> I wonder whether the open-coded version would do better if it
>>> unconditionally loaded the percpu:
>>>
>>>         pcpu = percpu_getref(class->prc_percpu);
>>>         KASSERTMSG(psref->psref_prevp == NULL || *psref->psref_prevp == psref,
>>>             "psref %p prevp %p points at %p",
>>>             psref, psref->psref_prevp, *psref->psref_prevp);
>>>         KASSERTMSG(psref->psref_prevp != NULL || pcpu->pcpu_first == psref,
>>>             "psref %p marked as first but psref_cpu %p on %d first is %p",
>>>             psref, pcpu, cpu_index(curcpu()), pcpu->pcpu_first);
>>>         *(psref->psref_prevp ? psref->psref_prevp : &pcpu->pcpu_first) =
>>>             psref->psref_next;
>>>         percpu_putref(class->prc_percpu);
>>>
>>> With DIAGNOSTIC disabled, I get a conditional move instruction instead
>>> of branches this way:
>>>
>>>  4f9:   e8 00 00 00 00          callq  4fe <psref_release+0x93>
>>>                         4fa: R_X86_64_PC32     percpu_getref+0xfffffffffffffffc
>>>  4fe:   48 8b 53 08             mov    0x8(%rbx),%rdx
>>>  502:   48 85 d2                test   %rdx,%rdx
>>>  505:   48 0f 44 d0             cmove  %rax,%rdx
>>>  509:   48 8b 03                mov    (%rbx),%rax
>>>  50c:   48 89 02                mov    %rax,(%rdx)
>>>  50f:   49 8b 7c 24 20          mov    0x20(%r12),%rdi
>>>  514:   e8 00 00 00 00          callq  519 <psref_release+0xae>
>>>                         515: R_X86_64_PC32     percpu_putref+0xfffffffffffffffc
>>>
>>> Also, my original patch was missing a percpu_putref.  I hope you put
>>> it back before you ran your test!
>>
>> I'll test with the patch in the other mail later.
>
> The above results were actually measured by knakahara@ and
> this time I've measured by myself (the measurement hardware
> is the same and the kernel configs of his and mine should
> be almost the same though).

I notice a difference between his and mine: check-out date of
the source code. Mine is checked out today while his is probably
some days ago.

>
> ...so the results show a slightly different trend:
>   - original:  137.9 138.0 (144.7) Mbps
>   - open-code: 135.2 134.6 (138.5) Mbps
>   - SLIST:     140.7 141.4 (140.1) Mbps

...though still these results are puzzling.

  ozaki-r

>
> I think knakahara@ (or someone else) needs to re-evaluate :-/
>
>   ozaki-r


Home | Main Index | Thread Index | Old Index