The following reply was made to PR kern/55402; it has been noted by GNATS.
From: Frank Kardel <kardel%netbsd.org@localhost>
To: gnats-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost,
gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost
Cc:
Subject: Re: kern/55402: amd64/9.99.68/9.99.68: xen/zfs - kernel: double fault
trap, code=0
Date: Sat, 20 Jun 2020 21:00:14 +0200
I can do that.
I do have the feeling that zfs code is very generous with stack. Also
scan has a recursive structure where I don't know the upper bound
of the recursion. This mini pool was just around 800GB. Every three
recursions we eat about 1k of stack.
We seem to be running pretty close to our kernel stack limit when
using zfs.
Maybe enlarging our kernel stack could also be an option.
Other systems seem to be able to handle normal zfs operations.
Will check for dsl_scan in original form anyway - takes some time though.
Frank
On 06/20/20 20:45, JaromÃr DoleÄek wrote:
> The following reply was made to PR kern/55402; it has been noted by GNATS.
>
> From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek%gmail.com@localhost>
> To: "gnats-bugs%NetBSD.org@localhost" <gnats-bugs%netbsd.org@localhost>
> Cc:
> Subject: Re: kern/55402: amd64/9.99.68/9.99.68: xen/zfs - kernel: double fault
> trap, code=0
> Date: Sat, 20 Jun 2020 20:39:52 +0200
>
> Can you confirm whether it's enough to apply just the change for
> vdev_queue.c, i.e. can you try the scrum with dsl_scan.c same as in
> repository (without patch)?
>
> I'd prefer to keep dsl_scan.c closer to upstream unless absolutely
> necessary to change.
>
> Jaromir
>
> Le sam. 20 juin 2020 =C3=A0 20:28, Frank Kardel <kardel%netbsd.org@localhost> a =C3=
> =A9crit :
> >
> > That did it - the scrub run now completed successfully.
> >
> > Frank
> >
> >
> > On 06/20/20 19:00, Jarom=C3=ADr Dole=C4=8Dek wrote:
> > > The following reply was made to PR kern/55402; it has been noted by GNA=
> TS.
> > >
> > > From: =3D?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=3D?=3D <jaromir.dolecek@gmail=
> .com>
> > > To: "gnats-bugs%NetBSD.org@localhost" <gnats-bugs%netbsd.org@localhost>
> > > Cc:
> > > Subject: Re: kern/55402: amd64/9.99.68/9.99.68: xen/zfs - kernel: doubl=
> e fault
> > > trap, code=3D0
> > > Date: Sat, 20 Jun 2020 18:55:28 +0200
> > >
> > > Le sam. 20 juin 2020 =3DC3=3DA0 15:45, Frank Kardel <kardel%netbsd.or@localhost=
> g> a =3DC3=3D
> > > =3DA9crit :
> > > > Well it goes a bit further but hits the double fault
> > > >
> > > > now with 57 frames (two > 1k) at about the same accumulated stack =
> size.
> > > >
> > > > Some space was saved by the patch, but presumably not enough.
> > >
> > > OK, I've updated the patch. My previous change in
> > > vdev_queue_io_to_issue() did not work, gcc returns the stack on the
> > > end of the function, not when going out of the block.
> > >
> > > New version reduced vdev_queue_io_to_issue() to use only 160 bytes of
> > > stack instead of 1152, and dsl_scan_visitbp()+ dsl_scan_visitdnode()
> > > pair now takes 40 less bytes.
> > >
> > > Can you check if this is enough to get it through?
> > >
> > > http://www.netbsd.org/~jdolecek/zfs_reduce_stack.diff
> > >
> > > Jaromir
> > >
> >
>