NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/55402: amd64/9.99.68/9.99.68: xen/zfs - kernel: double fault trap, code=0



The following reply was made to PR kern/55402; it has been noted by GNATS.

From: Frank Kardel <kardel%netbsd.org@localhost>
To: gnats-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost,
 gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Cc: 
Subject: Re: kern/55402: amd64/9.99.68/9.99.68: xen/zfs - kernel: double fault
 trap, code=0
Date: Sat, 20 Jun 2020 21:26:03 +0200

 It double faulted again. We are too close to the kernel stack limit. I fear
 
 we need to save more stack space and/or extend the kernel stack (if 
 possible).
 
 The issue here is the the stack usage is probably data dependent and we must
 
 not trip just because the pool scan needs more recursion stack space.
 
 Frank
 
 
 On 06/20/20 21:05, Frank Kardel wrote:
 > The following reply was made to PR kern/55402; it has been noted by GNATS.
 >
 > From: Frank Kardel <kardel%netbsd.org@localhost>
 > To: gnats-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost,
 >   gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
 > Cc:
 > Subject: Re: kern/55402: amd64/9.99.68/9.99.68: xen/zfs - kernel: double fault
 >   trap, code=0
 > Date: Sat, 20 Jun 2020 21:00:14 +0200
 >
 >   I can do that.
 >   
 >   I do have the feeling that zfs code is very generous with stack. Also
 >   
 >   scan has a recursive structure where I don't know the upper bound
 >   
 >   of the recursion. This mini pool was just around 800GB. Every three
 >   
 >   recursions we eat about 1k of stack.
 >   
 >   We seem to be running pretty close to our kernel stack limit when
 >   
 >   using zfs.
 >   
 >   Maybe enlarging our kernel stack could also be an option.
 >   
 >   Other systems seem to be able to handle normal zfs operations.
 >   
 >   Will check for dsl_scan in original form anyway - takes some time though.
 >   
 >   Frank
 >   
 >   
 >   
 >   On 06/20/20 20:45, Jarom�­r Dole��ek wrote:
 >   > The following reply was made to PR kern/55402; it has been noted by GNATS.
 >   >
 >   > From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek%gmail.com@localhost>
 >   > To: "gnats-bugs%NetBSD.org@localhost" <gnats-bugs%netbsd.org@localhost>
 >   > Cc:
 >   > Subject: Re: kern/55402: amd64/9.99.68/9.99.68: xen/zfs - kernel: double fault
 >   >   trap, code=0
 >   > Date: Sat, 20 Jun 2020 20:39:52 +0200
 >   >
 >   >   Can you confirm whether it's enough to apply just the change for
 >   >   vdev_queue.c, i.e. can you try the scrum with dsl_scan.c same as in
 >   >   repository (without patch)?
 >   >
 >   >   I'd prefer to keep dsl_scan.c closer to upstream unless absolutely
 >   >   necessary to change.
 >   >
 >   >   Jaromir
 >   >
 >   >   Le sam. 20 juin 2020 =C3=A0 20:28, Frank Kardel <kardel%netbsd.org@localhost> a =C3=
 >   >   =A9crit :
 >   >   >
 >   >   > That did it - the scrub run now completed successfully.
 >   >   >
 >   >   > Frank
 >   >   >
 >   >   >
 >   >   > On 06/20/20 19:00, Jarom=C3=ADr Dole=C4=8Dek wrote:
 >   >   > > The following reply was made to PR kern/55402; it has been noted by GNA=
 >   >   TS.
 >   >   > >
 >   >   > > From: =3D?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=3D?=3D <jaromir.dolecek@gmail=
 >   >   .com>
 >   >   > > To: "gnats-bugs%NetBSD.org@localhost" <gnats-bugs%netbsd.org@localhost>
 >   >   > > Cc:
 >   >   > > Subject: Re: kern/55402: amd64/9.99.68/9.99.68: xen/zfs - kernel: doubl=
 >   >   e fault
 >   >   > >   trap, code=3D0
 >   >   > > Date: Sat, 20 Jun 2020 18:55:28 +0200
 >   >   > >
 >   >   > >   Le sam. 20 juin 2020 =3DC3=3DA0 15:45, Frank Kardel <kardel%netbsd.or@localhost=
 >   >   g> a =3DC3=3D
 >   >   > >   =3DA9crit :
 >   >   > >   >  Well it goes a bit further but hits the double fault
 >   >   > >   >
 >   >   > >   >  now with 57 frames (two > 1k) at about the same accumulated stack =
 >   >   size.
 >   >   > >   >
 >   >   > >   >  Some space was saved by the patch, but presumably not enough.
 >   >   > >
 >   >   > >   OK, I've updated the patch. My previous change in
 >   >   > >   vdev_queue_io_to_issue() did not work, gcc returns the stack on the
 >   >   > >   end of the function, not when going out of the block.
 >   >   > >
 >   >   > >   New version reduced vdev_queue_io_to_issue() to use only 160 bytes of
 >   >   > >   stack instead of 1152, and dsl_scan_visitbp()+ dsl_scan_visitdnode()
 >   >   > >   pair now takes 40 less bytes.
 >   >   > >
 >   >   > >   Can you check if this is enough to get it through?
 >   >   > >
 >   >   > >   http://www.netbsd.org/~jdolecek/zfs_reduce_stack.diff
 >   >   > >
 >   >   > >   Jaromir
 >   >   > >
 >   >   >
 >   >
 >   
 


Home | Main Index | Thread Index | Old Index