Current-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: WANTED: nvme(4) driver testing on MP systems on -current



Hey,

thank you. This iostat_unbusy panic is typical symptom of the current
MP issues, the command completion queue gets corrupted, and
nvme_q_complete() delivers some commands twice. It causes either this
panic (due to duplicate lddone() for stale buf), or a random kernel
crash.

I've been working on debugging this for past two weeks or so. I have
some local changes (mainly some volatile classifiers) which seem to
fix this issue at least for my MP VirtualBox test machine. But these
changes still do not fix the issue completely on another real system I
have access to. I guess it would be useful to share the ongoing work
at least. I'll polish and commit what I have, today or tomorrow.

Jaromir

2016-10-18 10:40 GMT+02:00 Masanobu SAITOH <msaitoh%execsw.org@localhost>:
> On 2016/09/22 5:54, Jaromír Doleček wrote:
>>
>> Hello,
>>
>> NVMe driver in NetBSD-current was recently tweaked to fix several MP and
>> locking
>> issues, and the driver is now marked as MPSAFE by default.
>>
>> Most of this work was done on emulators since I lack the the hardware,
>> so it's not clear if
>> everything would work properly on real systems too.
>>
>> Anyone having the hardware, I'd appreciate if you could check the
>> driver out, and try
>> to punish the drive by some heavy I/O test with parallel load if
>> possible, and report
>> results.
>>
>> The driver should work on i386 and amd64, and is enabled in
>> INSTALL/GENERIC kernels there,
>> so you could just try to boot install iso from NetBSD daily builds,
>> and send-pr any
>> issues.
>>
>> I'd also especially welcome if someone with sparc64 system could test
>> the driver out, too.
>> The driver originates from OpenBSD where nvme(4) is enabled in GENERIC
>> sparc64
>> kernel, so it should work. But it was not confirmed yet on
>> NetBSD/sparc64. Note you might
>> need fairly modern system, at least some Intel NVMe cards require PCIe
>> Generation 3 to
>> actually work, so this rules out e.g. T1s.
>>
>> I'd also very welcome any benchmark results, it would be very
>> interesting to share some
>> IOPS figures.
>>
>> Let me know the results, I'd like to update driver manpage to list
>> known working hardware.
>>
>> In any reports, please include the attachment fragment from dmesg, as
>> there
>> is quite significant different between attachment via apic/INTx and
>> MSI/MSI-X.
>> Also useful would be intrctl(8) output, to confirm interrupt handlers
>> are dispatched
>> properly to individual available CPUs.
>>
>> Thank you.
>>
>> Jaromir
>>
>
> With nvme.c rev. 1.16:
>
>> Oct 18 17:14:02 five savecore: reboot after panic: panic:
>> ioWsAtRNatI_NWG:Au nRSNPILN GbNuO:Ts  SLPOyLW E RN
>
>
> and,
>
>> five# crash -M netbsd.36.core -N /netbsd
>> Crash version 7.99.39, image version 7.99.39.
>> System panicked: iostat_unbusy
>> Backtrace from time of crash is available.
>> crash> trace
>> _KERNEL_OPT_NVGA_RASTERCONSOLE() at 0
>> ?() at ffff80008f0e5240
>> vpanic() at vpanic+0x149
>> snprintf() at snprintf
>> iostat_isbusy() at iostat_isbusy
>> dk_done1() at dk_done1+0xab
>> lddone() at lddone+0xf
>> nvme_q_complete() at nvme_q_complete+0xc6
>> softint_dispatch() at softint_dispatch+0xd3
>> DDB lost frame for Xsoftintr+0x4f, trying 0xfffffe810e919ff0
>> Xsoftintr() at Xsoftintr+0x4f
>> --- interrupt ---
>> 0:
>
>
> Again, the panic message was:
>
>> Oct 18 17:14:02 five savecore: reboot after panic: panic:
>> ioWsAtRNatI_NWG:Au nRSNPILN GbNuO:Ts  SLPOyLW E RN
>
>
> -> panic: iostat_unbust
> -> WARNINWG:A RSNPILN GNO:T  SLPOLW E RN
>
>   -> WARNING: SPL NOT LOWER
>   -> WARNING: SPL N
>
> The full dmesg is at:
>
>         http://www.netbsd.org/~msaitoh/nvme-20161018-0.log
>
> Any test code are welcomed!
>
> --
> -----------------------------------------------
>                 SAITOH Masanobu (msaitoh%execsw.org@localhost
>                                  msaitoh%netbsd.org@localhost)



Home | Main Index | Thread Index | Old Index