NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-amd64/52596 (Another netbsd-8 panic)



It is still happening… despite I thought I found
the cause – nope, I'm still in the middle of
nowhere :-)

I'm ready to apply any patches that can help in
nailing it down, as Kamil@ suggested.

I did heavy testing with memtest86+, and the
hardware looks like stable.

PS:  please reopen the PR if you folks still care

PS2: what can be the cause of the garbage
     in x86_xsave_features? is it possible that
     some rotting subsystem is overwriting it?
     I'm using for example veriexec and KAME altq

PS3: I also tried the patch with the "mfence"
     instruction.  It didn't help.

-Dominik

On Mon, Oct 09, 2017 at 07:20:00PM +0000, Kamil Rytarowski wrote:
> The following reply was made to PR port-amd64/52596; it has been noted by GNATS.
> 
> From: Kamil Rytarowski <n54%gmx.com@localhost>
> To: gnats-bugs%NetBSD.org@localhost
> Cc: 
> Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
> Date: Mon, 9 Oct 2017 21:21:13 +0200
> 
>  This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
>  --cidepVPSeqFoUukHPl1uGdk4cLpk2SVRj
>  Content-Type: multipart/mixed; boundary="4WW8gCMGTltqtxfIRjsJGr2vwcT7hbkVT";
>   protected-headers="v1"
>  From: Kamil Rytarowski <n54%gmx.com@localhost>
>  To: gnats-bugs%NetBSD.org@localhost
>  Message-ID: <f2f55b8f-b697-c125-ad36-40318b8c93bb%gmx.com@localhost>
>  Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
>  References: <pr-port-amd64-52596%gnats.netbsd.org@localhost>
>   <20171006053940.1CB755CD8%yenn.ulegend.net@localhost>
>   <20171009130002.6603D7A2B0%mollari.NetBSD.org@localhost>
>   <20171009135258.GA11341%yenn.ulegend.net@localhost>
>  In-Reply-To: <20171009135258.GA11341%yenn.ulegend.net@localhost>
>  
>  --4WW8gCMGTltqtxfIRjsJGr2vwcT7hbkVT
>  Content-Type: text/plain; charset=utf-8
>  Content-Language: en-US
>  Content-Transfer-Encoding: quoted-printable
>  
>  On 09.10.2017 15:52, Dominik Bialy wrote:
>  > On Mon, Oct 09, 2017 at 01:00:02PM +0000, Kamil Rytarowski wrote:
>  >> The following reply was made to PR port-amd64/52596; it has been noted=
>   by GNATS.
>  >>
>  >> From: Kamil Rytarowski <n54%gmx.com@localhost>
>  >> To: gnats-bugs%NetBSD.org@localhost
>  >> Cc:=20
>  >> Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
>  >> Date: Mon, 9 Oct 2017 14:58:28 +0200
>  >>
>  >>  This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
>  >>  --tQv6se3kuwJu5afbaaitkaHLbm3ej6EH0
>  >>  Content-Type: multipart/mixed; boundary=3D"Q3m66g38MJWwicPqkbVeUfD3Op=
>  jiGD205";
>  >>   protected-headers=3D"v1"
>  >>  From: Kamil Rytarowski <n54%gmx.com@localhost>
>  >>  To: gnats-bugs%NetBSD.org@localhost
>  >>  Message-ID: <35eeab5d-5eb1-2c24-5719-4ee284bbd4e0%gmx.com@localhost>
>  >>  Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
>  >>  References: <pr-port-amd64-52596%gnats.netbsd.org@localhost>
>  >>   <20171006053940.1CB755CD8%yenn.ulegend.net@localhost>
>  >>   <20171009124001.6CD267A2AC%mollari.NetBSD.org@localhost>
>  >>  In-Reply-To: <20171009124001.6CD267A2AC%mollari.NetBSD.org@localhost>
>  >> =20
>  >>  --Q3m66g38MJWwicPqkbVeUfD3OpjiGD205
>  >>  Content-Type: text/plain; charset=3Dutf-8
>  >>  Content-Language: en-US
>  >>  Content-Transfer-Encoding: quoted-printable
>  >> =20
>  >>  On 09.10.2017 14:40, Dominik Bialy wrote:
>  >>  > The following reply was made to PR port-amd64/52596; it has been no=
>  ted =3D
>  >>  by GNATS.
>  >>  >=3D20
>  >>  > From: Dominik Bialy <dmb%yenn.ulegend.net@localhost>
>  >>  > To: coypu%sdf.org@localhost
>  >>  > Cc: Dominik Bialy <dmb%yenn.ulegend.net@localhost>, gnats-bugs%NetBSD.org@localhost
>  >>  > Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
>  >>  > Date: Mon, 9 Oct 2017 14:37:47 +0200
>  >>  >=3D20
>  >>  >  On Mon, Oct 09, 2017 at 10:13:34AM +0000, coypu%sdf.org@localhost wrote:
>  >>  >  > On Mon, Oct 09, 2017 at 10:01:47AM +0200, Dominik Bialy wrote:
>  >>  >  > > Current sysctls are:
>  >>  >  > >=3D20
>  >>  >  > > yenn# sysctl machdep.xsave_features
>  >>  >  > > machdep.xsave_features =3D3D 0
>  >>  >  > > yenn# sysctl machdep.fpu_save
>  >>  >  > > machdep.fpu_save =3D3D 1
>  >>  >  > >=3D20
>  >>  >  > > I'll try applying the patch today and building the kernel.
>  >>  >  >=3D20
>  >>  >  > sorry, I misread the code, it shouldn't make a functional differ=
>  ence=3D
>  >> =20
>  >>  >  > either way.
>  >>  >  >=3D20
>  >>  >  > do you have a coredump in /var/crash?
>  >>  >  > can you:
>  >>  >  > gunzip netbsd.3.core.gz
>  >>  >  > gunzip netbsd.3.gz
>  >>  >  > crash -M netbsd.3.core -N netbsd.3
>  >>  >  >=3D20
>  >>  >  > crash> dmesg
>  >>  >  > (only to confirm it died at the same spot)
>  >>  >  > crash> examine x86_xsave_features
>  >>  >  > crash> bt
>  >>  > =3D20
>  >>  >  I found one coredump from Sep 23 (sources were
>  >>  >  dated around Sep 15.)
>  >>  > =3D20
>  >>  >  fatal privileged instruction fault in supervisor mode
>  >>  >  trap type 0 code 0 rip 0xffffffff80224a52 cs 0x8 rflags 0x10016 cr=
>  2 0x=3D
>  >>  75ba90c36d60 ilevel 0x8 rsp 0xfffffe804057
>  >>  >  bea8
>  >>  >  curlwp 0xfffffe81318c2720 pid 391.2 lowest kstack 0xfffffe80405792=
>  c0
>  >>  >  panic: trap
>  >>  >  cpu1: Begin traceback...
>  >>  >  vpanic() at netbsd:vpanic+0x140
>  >>  >  snprintf() at netbsd:snprintf
>  >>  >  startlwp() at netbsd:startlwp
>  >>  >  alltraps() at netbsd:alltraps+0x96
>  >>  >  fpudna() at netbsd:fpudna+0x61
>  >>  >  cpu1: End traceback...
>  >>  > =3D20
>  >>  >  dumping to dev 18,1 (offset=3D3D132519, size=3D3D1032011):
>  >>  >  dump
>  >>  >  crash> examine x86_xsave_features
>  >>  >  x86_xsave_features:     160b78a0
>  >> =20
>  >>  Looks like trash..
>  >> =20
>  >>  Please try:
>  >>  examine x86_fpu_save_size
>  >>  examine x86_fpu_save
>  >>  examine i386_nocpuid_cpus
>  >> =20
>  >>  (checking if the stack has been damaged)
>  >=20
>  > yenn# crash -M netbsd.6.core -N netbsd.6
>  > Crash version 8.0_BETA, image version 8.0_BETA.
>  > System panicked: trap
>  > Backtrace from time of crash is available.
>  > crash> examine x86_fpu_save_size
>  > x86_fpu_save_size:      200
>  > crash> examine x86_fpu_save
>  > x86_fpu_save:   1
>  > crash> examine i386_nocpuid_cpus
>  > i386_nocpuid_cpus:      1
>  > crash>
>  >=20
>  
>  So something is overwrites x86_xsave_features with trash.
>  
>  A valid value would like like this:
>  $ sysctl machdep.xsave_features
>  
>  machdep.xsave_features =3D 7
>  
>  Unless I miss something, the only place of setting this value is in:
>  
>  /src/sys/arch/x86/x86/identcpu.c: cpu_probe_fpu(struct cpu_info *ci)
>  
>  x86_xsave_features =3D (uint64_t)descs[3] << 32 | descs[0];
>  
>  It would be easier to track it down with a reproducer, with temporary
>  asserts.. but I expect that we are restricted to reading the code.
>  
>  A possible hand-made assert is to put panic() like this:
>  
>  	/* Get features and maximum size of the save area */
>  	x86_cpuid(0xd, descs);
>  	if (descs[2] > 512)
>  		x86_fpu_save_size =3D descs[2];
>  
>  +       panic("Oops how did we get here!\n");
>  #ifdef XEN
>  	/* Don't use xsave, force fxsave with x86_xsave_features =3D 0. */
>  #else
>  	x86_xsave_features =3D (uint64_t)descs[3] << 32 | descs[0];
>  #endif
>  
>  Once it will be fired, we will need stacktrace.
>  
>  >> =20
>  >>  >  crash> bt
>  >>  >  _KERNEL_OPT_NARCNET() at 0
>  >>  >  _KERNEL_OPT_ACPI_SCANPCI() at _KERNEL_OPT_ACPI_SCANPCI+0x7
>  >>  >  vpanic() at vpanic+0x149
>  >>  >  snprintf() at snprintf
>  >>  >  startlwp() at startlwp
>  >>  >  calltrap() at calltrap+0x11
>  >>  >  fpudna() at fpudna+0x61
>  >>  >  crash>
>  >>  > =3D20
>  >>  >=3D20
>  >> =20
>  >> =20
>  >> =20
>  >>  --Q3m66g38MJWwicPqkbVeUfD3OpjiGD205--
>  >> =20
>  >>  --tQv6se3kuwJu5afbaaitkaHLbm3ej6EH0
>  >>  Content-Type: application/pgp-signature; name=3D"signature.asc"
>  >>  Content-Description: OpenPGP digital signature
>  >>  Content-Disposition: attachment; filename=3D"signature.asc"
>  >> =20
>  >>  -----BEGIN PGP SIGNATURE-----
>  >>  Version: GnuPG v2
>  >> =20
>  >>  iQIcBAEBCAAGBQJZ23J7AAoJEEuzCOmwLnZsgd4P/14ZY9CY1o2WTGneVG//Ai9U
>  >>  voBxogER+xwyI+4gChaMwZCixIpQ3fLJnQd89EHOcLWuVTVvFroiWtdFr+uhkSTH
>  >>  lG6xhAHvVWvv+UX3+BqoRNZVsSQfFWNpbWfUpS+71mKlNkWr/gKIKEOt3bl6+mEG
>  >>  kTtRlU+vGbaCVv90UYtJMfiTIoBKCSC/EDLTNnpfU7i0Rc+gUEBmaHj1yK5G1l5F
>  >>  3GxX2yjHW6yTIp9mYrd8Qo4gJ5SHBaTfo0lNxWX5YUKbGYhH5VqeIG/mkSpGlzUr
>  >>  uiRX3E2YWI7wpuAYDDxeAA9jhPTK0DJWDIGvmL7c3Renj7SdXUN2PVMR+w+cegBQ
>  >>  k6vPHUYzc5+OMj2azgVt1KGhf01i+PCPsQs1bforCQ1Q1CUO02oZuxRg+O97m2ph
>  >>  BKECkosmrN8JL3llfi54MI0JEo4mEvhjEswc5pToaMUWYJcEwzSaiMfgjX/eg2EQ
>  >>  oRumz1kR8pPkKAxHbwmb3G2L6fBo+iYx6RrQuWCXdjTCKaGe6LWbmd177c9rhBj7
>  >>  5xD8zXLhQ3dnLibtCSq8oKKNCWO2D2eO4v/bJx5I9axVNpOo8DYAz/muBtGiaT9y
>  >>  VCjDoDCa3+nXu8+WznmXp/iaKVeBkX8atearj4gpJC6xTjyi7HbDKw7Up8TLlKae
>  >>  isK0kEBa1v1SC7dpb+ZK
>  >>  =3DylKn
>  >>  -----END PGP SIGNATURE-----
>  >> =20
>  >>  --tQv6se3kuwJu5afbaaitkaHLbm3ej6EH0--
>  >> =20
>  >=20
>  
>  
>  
>  --4WW8gCMGTltqtxfIRjsJGr2vwcT7hbkVT--
>  
>  --cidepVPSeqFoUukHPl1uGdk4cLpk2SVRj
>  Content-Type: application/pgp-signature; name="signature.asc"
>  Content-Description: OpenPGP digital signature
>  Content-Disposition: attachment; filename="signature.asc"
>  
>  -----BEGIN PGP SIGNATURE-----
>  Version: GnuPG v2
>  
>  iQIcBAEBCAAGBQJZ28wvAAoJEEuzCOmwLnZs7iAP/Auf8ebvPhKUG61FIpZt2x4V
>  nqCzAyZZr5Y8ejn+FvkbkSICv1D82UnBVeFtlv59vkcBjKsn0rSbL+I0A6Qvv3aL
>  ba7B0076Ge15jtMUutDg/dFpdKLihhYm5VUYO2ODS1obDbitB4BzvFeSBtcj7DrG
>  txHAgax4k2Oc8iLCBMEIXP8f/ljSEfhnrSaUrDJIMrRruaex5cEm5FBNUe2Vi5LQ
>  tB1IAi69D5WlUfw1NKL0UWoaPqxTqrdtwhexvtbhT/OAFbWp/2Dnz7CbmsW3OnMa
>  wgp69Vt+NqtZMetkT6WzZLasS/uxjaPE8d6XHuffDBWUu5nB31wm9UOk3IHvV9RU
>  GDO6WgjzBwY/Ps/ukbIBeX88uzGNQkwGDreJRfBVIRXWjdPoBEZZUCea+g2i7FJN
>  84e/+Yyzh2K9iBu6nAJuoSUWM5AU3pUUZxbeMMO4xWMBlmLw3rrsqAB7xS5Tpf9e
>  N38ENi2OqKPtYbrPvviVnbaky/ycusL9eeVvn3IBcPcRVOn7rq/cnPNryF61Ij71
>  q700WBxhM281OSxAt8pDjJhIFSMPTbSRUZM+ySNysZXrVxJwuuEwtoMsQB7XNrLT
>  Jtp22Yj+zc/uGWiUJ1IKUdS061SVpazYigYL3yLlJ3Yy8WgDEQBAcgATon5Ekfn0
>  ULG9cB3ItvYy+E2Ku79D
>  =2ZrA
>  -----END PGP SIGNATURE-----
>  
>  --cidepVPSeqFoUukHPl1uGdk4cLpk2SVRj--
>  



Home | Main Index | Thread Index | Old Index