tech-kern archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: RFC: MSI/MSI-X implementation
Hi,
(2014/11/14 6:51), David Young wrote:
On Thu, Nov 13, 2014 at 01:59:09PM -0600, David Young wrote:
On Thu, Nov 13, 2014 at 12:41:38PM +0900, Kengo NAKAHARA wrote:
(2014/11/13 11:54), David Young wrote:
On Fri, Nov 07, 2014 at 04:41:55PM +0900, Kengo NAKAHARA wrote:
Could you comment the specification and implementation?
The user should not be on the hook to set processor affinity for the
interrupts. That is more properly the responsibility of the designer
and OS.
I wrote unclear explanation..., so please let me redescribe.
This MSI/MSI-X API *design* is independent from processor affinity.
The device dirvers can use MSI/MSI-X and processor affinity
independently of each other. In other words, legacy interrupts and
INTx interrupts can use processor affinity still. Furthermore,
MSI/MSI-X may or may not use processor affinity.
MSI/MSI-X is not half as useful as it ought to be if a driver's author
cannot spread interrupt workload across the available CPUs. If you
don't mind, please share your processor affinity proposal and show how
it works with interrupts.
Here are some cases that interest me:
Well..., before that, please let me make sure of general NIC multiqueue
implementation.
The number of Rx queues which enabled by multiqueue NIC device drivers
is equal to the number of CPUs. Similary, the number of Tx queue is equal
to the number of CPUs.
# Of course, NIC does not have enogh queues, the number of queues is limited
# by the number of queues
Furthermore, how to bind queues to MSI-X vectors is depend on device the
implementation of drivers.
[example A]
Some device drivers such as FreeBSD's if_vmx establish a MSI-X vector for
each Rx queue and Tx queue. Then, each MSI-X vectors are affinity to cpus
by round-robin of FreeBSD's style.
+ MSI-X vector setting
- Tx0 is binded to "irq256" (*1) interrupt
- Tx1 is binded to "irq257" interrupt
- Tx2 is binded to "irq258" interrupt
- Tx3 is binded to "irq259" interrupt
- Rx0 is binded to "irq260" interrupt
- Rx1 is binded to "irq261" interrupt
- Rx2 is binded to "irq262" interrupt
- Rx3 is binded to "irq263" interrupt
+ affinity setting
- irq256(for Tx0) is affinity to CPU0
- irq257(for Tx1) is affinity to CPU1
- irq258(for Tx2) is affinity to CPU2
- irq259(for Tx3) is affinity to CPU3
- irq260(for Rx0) is affinity to CPU0
- irq261(for Rx1) is affinity to CPU1
- irq262(for Rx2) is affinity to CPU2
- irq263(for Rx3) is affinity to CPU3
# oh, Tx0 and Rx0 are affinity to the same CPU, coincidentally
(*1) "irq[256-512]" is MSI/MSI-X vector's name of FreeBSD style
[example B]
Some device drivers such as if_igb establish a MSI-X vector for each
Rx/Tx queue pair. Then, each MSI-X vectors are also affinity to cpus
by round-robin of FreeBSD's style.
+ MSI-X vector setting
- Tx0 and Rx0 are binded to "irq256" interrupt
- Tx1 and Rx1 are binded to "irq257" interrupt
- Tx2 and Rx2 are binded to "irq258" interrupt
- Tx3 and Rx3 are binded to "irq259" interrupt
+ affinity setting
- irq256(for Tx0 and Rx0) is affinity to CPU0
- irq257(for Tx1 and Rx1) is affinity to CPU1
- irq258(for Tx2 and Rx2) is affinity to CPU2
- irq259(for Tx3 and Rx3) is affinity to CPU3
Do I understand above things correctly?
1) What interrupts does a driver establish if the NIC has separate
MSI/MSI-X interrupts for each of 4 Tx DMA rings and each of 4 Rx DMA
rings, and there are 2 logical CPUs? Can/does the driver provide
any hints about the processor that is the target of each interrupt?
What CPUs receive the interrupts?
I think general NIC enable only 2 Tx DMA and Rx DMA for 2 logical CPUs
system. To achive this, the device drivers should use kernel API which
return online CPU number.
In my MSI/MSI-X implementation, the interrupts are always set affinity
to CPU0.
# if there is no enough CPU0 resource (i.e. cpu_info.ci_isources in x86),
# interrupts are set affinity to CPU1
Using intr_distribute() API which is my IRQ affinity implementation,
the device drivers can set affinity to any CPUs as driver author like.
Futhermore, the system administrator can change processor affinity by
intrctl(9) if the administrator unlike driver's default affinity.
2) Same as above, but what if there are 4 logical CPUs?
This case is similar to 1), the only difference is device drivers use
4 Rx/Tx rings.
The interrupts are also set affinity to CPU0 without intr_distribute().
With intr_distribute(), the device driver can set affinity as the author
like, and the system administrator can set affinity with intrctl(9) as
he ro she like.
3) Same as previous, but what if there are 16 logical CPUs?
In this case, device drivers can use only 4 Rx/Tx rings by hardware limit.
I think the device drivers may set affinity below 2 patterns.
[pattern 1]
- Tx0 interrupt is set affinity to CPU0
- Tx1 interrupt is set affinity to CPU1
- Tx2 interrupt is set affinity to CPU2
- Tx3 interrupt is set affinity to CPU3
- Rx0 interrupt is set affinity to CPU4
- Rx1 interrupt is set affinity to CPU5
- Rx2 interrupt is set affinity to CPU6
- Rx3 interrupt is set affinity to CPU7
[pattern 2]
- Tx0 interrupt is set affinity to CPU0
- Tx1 interrupt is set affinity to CPU1
- Tx2 interrupt is set affinity to CPU2
- Tx3 interrupt is set affinity to CPU3
- Rx0 interrupt is set affinity to CPU0
- Rx1 interrupt is set affinity to CPU1
- Rx2 interrupt is set affinity to CPU2
- Rx3 interrupt is set affinity to CPU3
The device drivers should be measured performance of each patterns
for author to decide which pattern use.
There's more than one way to crack this nut, I'm just wondering how you
propose to crack it. :-)
I think device driver authors or system administrators crack it. I
would implement needed kernel APIs and userland commands to do it.
In other words, to be honest, I have no idea to resolve all situations
smartly...
Thanks,
--
//////////////////////////////////////////////////////////////////////
Internet Initiative Japan Inc.
Device Engineering Section,
Core Product Development Department,
Product Division,
Technology Unit
Kengo NAKAHARA <k-nakahara%iij.ad.jp@localhost>
Home |
Main Index |
Thread Index |
Old Index