Re: kern/52769: hang with an ffs stored in an nvme device

To: gnats-bugs%NetBSD.org@localhost
Subject: Re: kern/52769: hang with an ffs stored in an nvme device
From: Paul Goyette <paul%whooppee.com@localhost>
Date: Fri, 16 Mar 2018 17:07:46 +0800 (+08)

On Wed, 14 Mar 2018, JaromÃr DoleÄ~Mek wrote:

>   q_nccbs = 0x20,
>   q_nccbs_avail = 0x21,

This is highly suspicious, q_nccbs_avail should always be <= q_nccbs. Good
that the driver deadlocked, it would panic in nvme_ccb_get() once it would
try to get the nonexisting 33th ccb from queue :)

This got me thinking though. If the completion queue is being processed, we
currently don't reset q_nccbs_avail until after all finished ccbs are
processed. While this is running, any further I/O would be skipped with
EAGAIN, if all ccbs were taken and q_nccbs_avail was 0. When the ccb
counter is reset on the end of nvme_q_complete(), there is no outstanding
I/O any more which would trigger another lddone() and do the queue drain,
so the driver ceases to process anything. This scenario matches the
described symtoms quite well.

Can you please try patch from
http://www.netbsd.org/~jdolecek/nvme_avail_put.diff ?

Initial testing with this patch is looking good. I'm currently runninga 'cvs update' against the same tree in which I'm running a "build.sh-j24 release" and so far no hang.

It's compile tested only, so might need some tweaks. The idea is to reset
the ccb counter immediatelly, so lddone() would be able to queue another
I/O while the completion queue is being still processed. This should also
fix ccb leak on errors - e.g. nvme_ns_dobio() calls just nvme_ccb_put()
when bus_dmamap_load() fails, so q_nccbs_avail stays decremented from
nvme_ccb_get().

Just based on reading the patch, it would appear to make sense to commiteven if it doesn't completely fix the hang. The patch might not be"sufficient" but it would appear to be "necessary". :)



+------------------+--------------------------+----------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:          |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee dot com   |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd dot org |
+------------------+--------------------------+----------------------------+

References:
- Re: kern/52769: hang with an ffs stored in an nvme device
  - From: Jaromír Doleček

Prev by Date: PR/53060 CVS commit: src/sys/arch/amd64/amd64
Next by Date: Re: kern/52769: hang with an ffs stored in an nvme device
Previous by Thread: Re: kern/52769: hang with an ffs stored in an nvme device
Next by Thread: Re: kern/52769: hang with an ffs stored in an nvme device
Indexes:

Home | Main Index | Thread Index | Old Index