The following reply was made to PR kern/48733; it has been noted by GNATS.
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: kern-bug-people%NetBSD.org@localhost, gnats-admin%NetBSD.org@localhost,
netbsd-bugs%NetBSD.org@localhost
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Fri, 11 Apr 2014 17:59:09 +0200
On Fri, Apr 11, 2014 at 02:40:00PM +0000,
Wolfgang.Stukenbrock%nagler-company.com@localhost wrote:
> >Description:
> Problem located in /src/sys/netinet/ip_output.c.
> Since file revision 1.208 the Kernel-Lock is locked prior calling if_output
> on the interface.
> Now - at least the wm-driver - will call splnet() and splx() inside the
output
> routine.
> If any interrupt occurs in between splnet() and splx(), the interrupt is
delayed and
> is processes in splx() when the level is released again.
> If such an interrupt is e.g. not MP-SAFE, the call stup in
intr_biglock_wrapper() is
> used to call the interrupt routine and that one will lock the KERNEL-LOCK
again.
> So we try to lock it again here -> dead-lock.
>
> Our system runs fine with 4 8257x interfaces, but after adding 2 additional 8254x
> interfaces, the system lock-up after a short time. Don't ask me, why the
if_output
> call takes "to long" with theese two additonal interfaces, but it is
reproducable.
> I've analysed this several times with DDB. Most times I've seen an
USB-interrupt
> that dead-lock the system.
I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
it can be locked multiple time on the same CPU. So it's not a problem
that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
increase the lock count. A splx() on another CPU will wait for the
KERNEL_LOCK to be relased.
I think your problem is more likely in the USB stack.
Maybe one of your new ethernet interface shares an interrupt with the
USB controller ?
> >How-To-Repeat:
> Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at
the same
> time. It is just a question of time till system-dead-lock.
> >Fix:
> Fist guess: revert change done from 1.207 to 1.208.
> But I've no idea about side effects.
Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
If you revert ip_output 1.208, you'll also have to revert ip_input.c
1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--
Received: from DBXPR07MB317.eurprd07.prod.outlook.com (10.141.12.139) by
DBXPR07MB319.eurprd07.prod.outlook.com (10.141.12.141) with Microsoft SMTP
Server (TLS) id 15.0.918.8 via Mailbox Transport; Fri, 11 Apr 2014 16:00:11
+0000
Received: from DBXPR07CA001.eurprd07.prod.outlook.com (10.255.191.159) by
DBXPR07MB317.eurprd07.prod.outlook.com (10.141.12.139) with Microsoft SMTP
Server (TLS) id 15.0.918.8; Fri, 11 Apr 2014 16:00:10 +0000
Received: from DB3FFO11FD011.protection.gbl (2a01:111:f400:7e04::177) by
DBXPR07CA001.outlook.office365.com (2a01:111:e400:9800::31) with Microsoft
SMTP Server (TLS) id 15.0.918.8 via Frontend Transport; Fri, 11 Apr 2014
16:00:09 +0000
Received: from e002.nagler-company.com (212.185.86.227) by
DB3FFO11FD011.mail.protection.outlook.com (10.47.216.167) with Microsoft SMTP
Server (TLS) id 15.0.918.6 via Frontend Transport; Fri, 11 Apr 2014 16:00:08
+0000
Received: from mollari.NetBSD.org (mollari.netbsd.org [149.20.53.80])
by e002.nagler-company.com (8.14.7/8.14.7) with ESMTP id s3BG0378005931
for <Wolfgang.Stukenbrock%nagler-company.com@localhost>; Fri, 11 Apr
2014 18:00:06 +0200 (CEST)
Received: by mollari.NetBSD.org (Postfix, from userid 31008)
id C23A5A5828; Fri, 11 Apr 2014 16:00:01 +0000 (UTC)
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: <kern-bug-people%netbsd.org@localhost>, <gnats-admin%netbsd.org@localhost>,
<netbsd-bugs%netbsd.org@localhost>,
<Wolfgang.Stukenbrock%nagler-company.com@localhost>
Reply-To: <gnats-bugs%NetBSD.org@localhost>
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
References: <pr-kern-48733%gnats.netbsd.org@localhost>
<20140411131311.74AF4123B93%test-s0.nagler-company.com@localhost>
X-Gnats-Was-Stupid: no
CC:
Message-ID: <20140411160001.C23A5A5828%mollari.NetBSD.org@localhost>
Date: Fri, 11 Apr 2014 16:00:01 +0000
Return-Path: gnats%NetBSD.org@localhost
X-EOPAttributedMessage: 0
X-MS-Exchange-Organization-MessageDirectionality: Incoming
X-Forefront-Antispam-Report:
CIP:212.185.86.227;CTRY:DE;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(979002)(6009001)(428001)(50944004)(51704005)(24454002)(199002)(189002)(16796002)(46386002)(74502001)(70486001)(50466002)(87836001)(46102001)(90896003)(77982001)(52956003)(54356999)(50986999)(76176999)(53806999)(81542001)(81342001)(74662001)(45336002)(43066001)(83072002)(85852003)(6806004)(19580395003)(83322001)(42186004)(44976005)(80976001)(19580405001)(76482001)(99396002)(4396001)(80022001)(33656001)(20776003)(47776003)(48376002)(2201001)(79102001)(92726001)(42882001)(90966001)(969003)(989001)(999001)(1009001)(1019001);DIR:INB;SFP:;SCL:1;SRVR:DBXPR07MB317;H:e002.nagler-company.com;FPR:BF74F31D.9C06D725.B1F32CB3.4CE95053.203B9;PTR:e002.nagler-company.com;A:1;MX:1;LANG:en;
Content-Type: text/plain
X-MS-Exchange-Organization-Network-Message-Id:
ae18e11e-2fa8-44eb-8125-08d123b63ff9
X-MS-Exchange-Organization-AVStamp-Service: 1.0
Received-SPF: None (: NetBSD.org does not designate permitted sender hosts)
X-MS-Exchange-Organization-SCL: 1
X-MS-Exchange-Organization-AuthSource: DB3FFO11FD011.protection.gbl
X-MS-Exchange-Organization-AuthAs: Anonymous
MIME-Version: 1.0
The following reply was made to PR kern/48733; it has been noted by GNATS.
From: Manuel Bouyer <bouyer%antioche.eu.org@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: kern-bug-people%NetBSD.org@localhost, gnats-admin%NetBSD.org@localhost,
netbsd-bugs%NetBSD.org@localhost
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Fri, 11 Apr 2014 17:59:09 +0200
On Fri, Apr 11, 2014 at 02:40:00PM +0000,
Wolfgang.Stukenbrock%nagler-company.com@localhost wrote:
> >Description:
> Problem located in /src/sys/netinet/ip_output.c.
> Since file revision 1.208 the Kernel-Lock is locked prior calling if_output
> on the interface.
> Now - at least the wm-driver - will call splnet() and splx() inside the
output
> routine.
> If any interrupt occurs in between splnet() and splx(), the interrupt is
delayed and
> is processes in splx() when the level is released again.
> If such an interrupt is e.g. not MP-SAFE, the call stup in
intr_biglock_wrapper() is
> used to call the interrupt routine and that one will lock the KERNEL-LOCK
again.
> So we try to lock it again here -> dead-lock.
>
> Our system runs fine with 4 8257x interfaces, but after adding 2 additional 8254x
> interfaces, the system lock-up after a short time. Don't ask me, why the
if_output
> call takes "to long" with theese two additonal interfaces, but it is
reproducable.
> I've analysed this several times with DDB. Most times I've seen an
USB-interrupt
> that dead-lock the system.
I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
it can be locked multiple time on the same CPU. So it's not a problem
that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
increase the lock count. A splx() on another CPU will wait for the
KERNEL_LOCK to be relased.
I think your problem is more likely in the USB stack.
Maybe one of your new ethernet interface shares an interrupt with the
USB controller ?
> >How-To-Repeat:
> Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at
the same
> time. It is just a question of time till system-dead-lock.
> >Fix:
> Fist guess: revert change done from 1.207 to 1.208.
> But I've no idea about side effects.
Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
If you revert ip_output 1.208, you'll also have to revert ip_input.c
1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.
--
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
NetBSD: 26 ans d'experience feront toujours la difference
--