NetBSD-Bugs archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: port-amd64/52596 (Another netbsd-8 panic)
The following reply was made to PR port-amd64/52596; it has been noted by GNATS.
From: Dominik Bialy <dmb%yenn.ulegend.net@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: port-amd64-maintainer%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
netbsd-bugs%netbsd.org@localhost, dmb%yenn.ulegend.net@localhost
Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
Date: Mon, 18 Dec 2017 13:50:24 +0100
It is still happeningâ?¦ despite I thought I found
the cause â?? nope, I'm still in the middle of
nowhere :-)
I'm ready to apply any patches that can help in
nailing it down, as Kamil@ suggested.
I did heavy testing with memtest86+, and the
hardware looks like stable.
PS: please reopen the PR if you folks still care
PS2: what can be the cause of the garbage
in x86_xsave_features? is it possible that
some rotting subsystem is overwriting it?
I'm using for example veriexec and KAME altq
PS3: I also tried the patch with the "mfence"
instruction. It didn't help.
-Dominik
On Mon, Oct 09, 2017 at 07:20:00PM +0000, Kamil Rytarowski wrote:
> The following reply was made to PR port-amd64/52596; it has been noted by GNATS.
>
> From: Kamil Rytarowski <n54%gmx.com@localhost>
> To: gnats-bugs%NetBSD.org@localhost
> Cc:
> Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
> Date: Mon, 9 Oct 2017 21:21:13 +0200
>
> This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
> --cidepVPSeqFoUukHPl1uGdk4cLpk2SVRj
> Content-Type: multipart/mixed; boundary="4WW8gCMGTltqtxfIRjsJGr2vwcT7hbkVT";
> protected-headers="v1"
> From: Kamil Rytarowski <n54%gmx.com@localhost>
> To: gnats-bugs%NetBSD.org@localhost
> Message-ID: <f2f55b8f-b697-c125-ad36-40318b8c93bb%gmx.com@localhost>
> Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
> References: <pr-port-amd64-52596%gnats.netbsd.org@localhost>
> <20171006053940.1CB755CD8%yenn.ulegend.net@localhost>
> <20171009130002.6603D7A2B0%mollari.NetBSD.org@localhost>
> <20171009135258.GA11341%yenn.ulegend.net@localhost>
> In-Reply-To: <20171009135258.GA11341%yenn.ulegend.net@localhost>
>
> --4WW8gCMGTltqtxfIRjsJGr2vwcT7hbkVT
> Content-Type: text/plain; charset=utf-8
> Content-Language: en-US
> Content-Transfer-Encoding: quoted-printable
>
> On 09.10.2017 15:52, Dominik Bialy wrote:
> > On Mon, Oct 09, 2017 at 01:00:02PM +0000, Kamil Rytarowski wrote:
> >> The following reply was made to PR port-amd64/52596; it has been noted=
> by GNATS.
> >>
> >> From: Kamil Rytarowski <n54%gmx.com@localhost>
> >> To: gnats-bugs%NetBSD.org@localhost
> >> Cc:=20
> >> Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
> >> Date: Mon, 9 Oct 2017 14:58:28 +0200
> >>
> >> This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
> >> --tQv6se3kuwJu5afbaaitkaHLbm3ej6EH0
> >> Content-Type: multipart/mixed; boundary=3D"Q3m66g38MJWwicPqkbVeUfD3Op=
> jiGD205";
> >> protected-headers=3D"v1"
> >> From: Kamil Rytarowski <n54%gmx.com@localhost>
> >> To: gnats-bugs%NetBSD.org@localhost
> >> Message-ID: <35eeab5d-5eb1-2c24-5719-4ee284bbd4e0%gmx.com@localhost>
> >> Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
> >> References: <pr-port-amd64-52596%gnats.netbsd.org@localhost>
> >> <20171006053940.1CB755CD8%yenn.ulegend.net@localhost>
> >> <20171009124001.6CD267A2AC%mollari.NetBSD.org@localhost>
> >> In-Reply-To: <20171009124001.6CD267A2AC%mollari.NetBSD.org@localhost>
> >> =20
> >> --Q3m66g38MJWwicPqkbVeUfD3OpjiGD205
> >> Content-Type: text/plain; charset=3Dutf-8
> >> Content-Language: en-US
> >> Content-Transfer-Encoding: quoted-printable
> >> =20
> >> On 09.10.2017 14:40, Dominik Bialy wrote:
> >> > The following reply was made to PR port-amd64/52596; it has been no=
> ted =3D
> >> by GNATS.
> >> >=3D20
> >> > From: Dominik Bialy <dmb%yenn.ulegend.net@localhost>
> >> > To: coypu%sdf.org@localhost
> >> > Cc: Dominik Bialy <dmb%yenn.ulegend.net@localhost>, gnats-bugs%NetBSD.org@localhost
> >> > Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
> >> > Date: Mon, 9 Oct 2017 14:37:47 +0200
> >> >=3D20
> >> > On Mon, Oct 09, 2017 at 10:13:34AM +0000, coypu%sdf.org@localhost wrote:
> >> > > On Mon, Oct 09, 2017 at 10:01:47AM +0200, Dominik Bialy wrote:
> >> > > > Current sysctls are:
> >> > > >=3D20
> >> > > > yenn# sysctl machdep.xsave_features
> >> > > > machdep.xsave_features =3D3D 0
> >> > > > yenn# sysctl machdep.fpu_save
> >> > > > machdep.fpu_save =3D3D 1
> >> > > >=3D20
> >> > > > I'll try applying the patch today and building the kernel.
> >> > >=3D20
> >> > > sorry, I misread the code, it shouldn't make a functional differ=
> ence=3D
> >> =20
> >> > > either way.
> >> > >=3D20
> >> > > do you have a coredump in /var/crash?
> >> > > can you:
> >> > > gunzip netbsd.3.core.gz
> >> > > gunzip netbsd.3.gz
> >> > > crash -M netbsd.3.core -N netbsd.3
> >> > >=3D20
> >> > > crash> dmesg
> >> > > (only to confirm it died at the same spot)
> >> > > crash> examine x86_xsave_features
> >> > > crash> bt
> >> > =3D20
> >> > I found one coredump from Sep 23 (sources were
> >> > dated around Sep 15.)
> >> > =3D20
> >> > fatal privileged instruction fault in supervisor mode
> >> > trap type 0 code 0 rip 0xffffffff80224a52 cs 0x8 rflags 0x10016 cr=
> 2 0x=3D
> >> 75ba90c36d60 ilevel 0x8 rsp 0xfffffe804057
> >> > bea8
> >> > curlwp 0xfffffe81318c2720 pid 391.2 lowest kstack 0xfffffe80405792=
> c0
> >> > panic: trap
> >> > cpu1: Begin traceback...
> >> > vpanic() at netbsd:vpanic+0x140
> >> > snprintf() at netbsd:snprintf
> >> > startlwp() at netbsd:startlwp
> >> > alltraps() at netbsd:alltraps+0x96
> >> > fpudna() at netbsd:fpudna+0x61
> >> > cpu1: End traceback...
> >> > =3D20
> >> > dumping to dev 18,1 (offset=3D3D132519, size=3D3D1032011):
> >> > dump
> >> > crash> examine x86_xsave_features
> >> > x86_xsave_features: 160b78a0
> >> =20
> >> Looks like trash..
> >> =20
> >> Please try:
> >> examine x86_fpu_save_size
> >> examine x86_fpu_save
> >> examine i386_nocpuid_cpus
> >> =20
> >> (checking if the stack has been damaged)
> >=20
> > yenn# crash -M netbsd.6.core -N netbsd.6
> > Crash version 8.0_BETA, image version 8.0_BETA.
> > System panicked: trap
> > Backtrace from time of crash is available.
> > crash> examine x86_fpu_save_size
> > x86_fpu_save_size: 200
> > crash> examine x86_fpu_save
> > x86_fpu_save: 1
> > crash> examine i386_nocpuid_cpus
> > i386_nocpuid_cpus: 1
> > crash>
> >=20
>
> So something is overwrites x86_xsave_features with trash.
>
> A valid value would like like this:
> $ sysctl machdep.xsave_features
>
> machdep.xsave_features =3D 7
>
> Unless I miss something, the only place of setting this value is in:
>
> /src/sys/arch/x86/x86/identcpu.c: cpu_probe_fpu(struct cpu_info *ci)
>
> x86_xsave_features =3D (uint64_t)descs[3] << 32 | descs[0];
>
> It would be easier to track it down with a reproducer, with temporary
> asserts.. but I expect that we are restricted to reading the code.
>
> A possible hand-made assert is to put panic() like this:
>
> /* Get features and maximum size of the save area */
> x86_cpuid(0xd, descs);
> if (descs[2] > 512)
> x86_fpu_save_size =3D descs[2];
>
> + panic("Oops how did we get here!\n");
> #ifdef XEN
> /* Don't use xsave, force fxsave with x86_xsave_features =3D 0. */
> #else
> x86_xsave_features =3D (uint64_t)descs[3] << 32 | descs[0];
> #endif
>
> Once it will be fired, we will need stacktrace.
>
> >> =20
> >> > crash> bt
> >> > _KERNEL_OPT_NARCNET() at 0
> >> > _KERNEL_OPT_ACPI_SCANPCI() at _KERNEL_OPT_ACPI_SCANPCI+0x7
> >> > vpanic() at vpanic+0x149
> >> > snprintf() at snprintf
> >> > startlwp() at startlwp
> >> > calltrap() at calltrap+0x11
> >> > fpudna() at fpudna+0x61
> >> > crash>
> >> > =3D20
> >> >=3D20
> >> =20
> >> =20
> >> =20
> >> --Q3m66g38MJWwicPqkbVeUfD3OpjiGD205--
> >> =20
> >> --tQv6se3kuwJu5afbaaitkaHLbm3ej6EH0
> >> Content-Type: application/pgp-signature; name=3D"signature.asc"
> >> Content-Description: OpenPGP digital signature
> >> Content-Disposition: attachment; filename=3D"signature.asc"
> >> =20
> >> -----BEGIN PGP SIGNATURE-----
> >> Version: GnuPG v2
> >> =20
> >> iQIcBAEBCAAGBQJZ23J7AAoJEEuzCOmwLnZsgd4P/14ZY9CY1o2WTGneVG//Ai9U
> >> voBxogER+xwyI+4gChaMwZCixIpQ3fLJnQd89EHOcLWuVTVvFroiWtdFr+uhkSTH
> >> lG6xhAHvVWvv+UX3+BqoRNZVsSQfFWNpbWfUpS+71mKlNkWr/gKIKEOt3bl6+mEG
> >> kTtRlU+vGbaCVv90UYtJMfiTIoBKCSC/EDLTNnpfU7i0Rc+gUEBmaHj1yK5G1l5F
> >> 3GxX2yjHW6yTIp9mYrd8Qo4gJ5SHBaTfo0lNxWX5YUKbGYhH5VqeIG/mkSpGlzUr
> >> uiRX3E2YWI7wpuAYDDxeAA9jhPTK0DJWDIGvmL7c3Renj7SdXUN2PVMR+w+cegBQ
> >> k6vPHUYzc5+OMj2azgVt1KGhf01i+PCPsQs1bforCQ1Q1CUO02oZuxRg+O97m2ph
> >> BKECkosmrN8JL3llfi54MI0JEo4mEvhjEswc5pToaMUWYJcEwzSaiMfgjX/eg2EQ
> >> oRumz1kR8pPkKAxHbwmb3G2L6fBo+iYx6RrQuWCXdjTCKaGe6LWbmd177c9rhBj7
> >> 5xD8zXLhQ3dnLibtCSq8oKKNCWO2D2eO4v/bJx5I9axVNpOo8DYAz/muBtGiaT9y
> >> VCjDoDCa3+nXu8+WznmXp/iaKVeBkX8atearj4gpJC6xTjyi7HbDKw7Up8TLlKae
> >> isK0kEBa1v1SC7dpb+ZK
> >> =3DylKn
> >> -----END PGP SIGNATURE-----
> >> =20
> >> --tQv6se3kuwJu5afbaaitkaHLbm3ej6EH0--
> >> =20
> >=20
>
>
>
> --4WW8gCMGTltqtxfIRjsJGr2vwcT7hbkVT--
>
> --cidepVPSeqFoUukHPl1uGdk4cLpk2SVRj
> Content-Type: application/pgp-signature; name="signature.asc"
> Content-Description: OpenPGP digital signature
> Content-Disposition: attachment; filename="signature.asc"
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2
>
> iQIcBAEBCAAGBQJZ28wvAAoJEEuzCOmwLnZs7iAP/Auf8ebvPhKUG61FIpZt2x4V
> nqCzAyZZr5Y8ejn+FvkbkSICv1D82UnBVeFtlv59vkcBjKsn0rSbL+I0A6Qvv3aL
> ba7B0076Ge15jtMUutDg/dFpdKLihhYm5VUYO2ODS1obDbitB4BzvFeSBtcj7DrG
> txHAgax4k2Oc8iLCBMEIXP8f/ljSEfhnrSaUrDJIMrRruaex5cEm5FBNUe2Vi5LQ
> tB1IAi69D5WlUfw1NKL0UWoaPqxTqrdtwhexvtbhT/OAFbWp/2Dnz7CbmsW3OnMa
> wgp69Vt+NqtZMetkT6WzZLasS/uxjaPE8d6XHuffDBWUu5nB31wm9UOk3IHvV9RU
> GDO6WgjzBwY/Ps/ukbIBeX88uzGNQkwGDreJRfBVIRXWjdPoBEZZUCea+g2i7FJN
> 84e/+Yyzh2K9iBu6nAJuoSUWM5AU3pUUZxbeMMO4xWMBlmLw3rrsqAB7xS5Tpf9e
> N38ENi2OqKPtYbrPvviVnbaky/ycusL9eeVvn3IBcPcRVOn7rq/cnPNryF61Ij71
> q700WBxhM281OSxAt8pDjJhIFSMPTbSRUZM+ySNysZXrVxJwuuEwtoMsQB7XNrLT
> Jtp22Yj+zc/uGWiUJ1IKUdS061SVpazYigYL3yLlJ3Yy8WgDEQBAcgATon5Ekfn0
> ULG9cB3ItvYy+E2Ku79D
> =2ZrA
> -----END PGP SIGNATURE-----
>
> --cidepVPSeqFoUukHPl1uGdk4cLpk2SVRj--
>
Home |
Main Index |
Thread Index |
Old Index