NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: port-amd64/52596 (Another netbsd-8 panic)



The following reply was made to PR port-amd64/52596; it has been noted by GNATS.

From: Dominik Bialy <dmb%yenn.ulegend.net@localhost>
To: gnats-bugs%NetBSD.org@localhost
Cc: port-amd64-maintainer%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
	netbsd-bugs%netbsd.org@localhost, dmb%yenn.ulegend.net@localhost
Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
Date: Mon, 18 Dec 2017 13:50:24 +0100

 It is still happeningâ?¦ despite I thought I found
 the cause â?? nope, I'm still in the middle of
 nowhere :-)
 
 I'm ready to apply any patches that can help in
 nailing it down, as Kamil@ suggested.
 
 I did heavy testing with memtest86+, and the
 hardware looks like stable.
 
 PS:  please reopen the PR if you folks still care
 
 PS2: what can be the cause of the garbage
      in x86_xsave_features? is it possible that
      some rotting subsystem is overwriting it?
      I'm using for example veriexec and KAME altq
 
 PS3: I also tried the patch with the "mfence"
      instruction.  It didn't help.
 
 -Dominik
 
 On Mon, Oct 09, 2017 at 07:20:00PM +0000, Kamil Rytarowski wrote:
 > The following reply was made to PR port-amd64/52596; it has been noted by GNATS.
 > 
 > From: Kamil Rytarowski <n54%gmx.com@localhost>
 > To: gnats-bugs%NetBSD.org@localhost
 > Cc: 
 > Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
 > Date: Mon, 9 Oct 2017 21:21:13 +0200
 > 
 >  This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
 >  --cidepVPSeqFoUukHPl1uGdk4cLpk2SVRj
 >  Content-Type: multipart/mixed; boundary="4WW8gCMGTltqtxfIRjsJGr2vwcT7hbkVT";
 >   protected-headers="v1"
 >  From: Kamil Rytarowski <n54%gmx.com@localhost>
 >  To: gnats-bugs%NetBSD.org@localhost
 >  Message-ID: <f2f55b8f-b697-c125-ad36-40318b8c93bb%gmx.com@localhost>
 >  Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
 >  References: <pr-port-amd64-52596%gnats.netbsd.org@localhost>
 >   <20171006053940.1CB755CD8%yenn.ulegend.net@localhost>
 >   <20171009130002.6603D7A2B0%mollari.NetBSD.org@localhost>
 >   <20171009135258.GA11341%yenn.ulegend.net@localhost>
 >  In-Reply-To: <20171009135258.GA11341%yenn.ulegend.net@localhost>
 >  
 >  --4WW8gCMGTltqtxfIRjsJGr2vwcT7hbkVT
 >  Content-Type: text/plain; charset=utf-8
 >  Content-Language: en-US
 >  Content-Transfer-Encoding: quoted-printable
 >  
 >  On 09.10.2017 15:52, Dominik Bialy wrote:
 >  > On Mon, Oct 09, 2017 at 01:00:02PM +0000, Kamil Rytarowski wrote:
 >  >> The following reply was made to PR port-amd64/52596; it has been noted=
 >   by GNATS.
 >  >>
 >  >> From: Kamil Rytarowski <n54%gmx.com@localhost>
 >  >> To: gnats-bugs%NetBSD.org@localhost
 >  >> Cc:=20
 >  >> Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
 >  >> Date: Mon, 9 Oct 2017 14:58:28 +0200
 >  >>
 >  >>  This is an OpenPGP/MIME signed message (RFC 4880 and 3156)
 >  >>  --tQv6se3kuwJu5afbaaitkaHLbm3ej6EH0
 >  >>  Content-Type: multipart/mixed; boundary=3D"Q3m66g38MJWwicPqkbVeUfD3Op=
 >  jiGD205";
 >  >>   protected-headers=3D"v1"
 >  >>  From: Kamil Rytarowski <n54%gmx.com@localhost>
 >  >>  To: gnats-bugs%NetBSD.org@localhost
 >  >>  Message-ID: <35eeab5d-5eb1-2c24-5719-4ee284bbd4e0%gmx.com@localhost>
 >  >>  Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
 >  >>  References: <pr-port-amd64-52596%gnats.netbsd.org@localhost>
 >  >>   <20171006053940.1CB755CD8%yenn.ulegend.net@localhost>
 >  >>   <20171009124001.6CD267A2AC%mollari.NetBSD.org@localhost>
 >  >>  In-Reply-To: <20171009124001.6CD267A2AC%mollari.NetBSD.org@localhost>
 >  >> =20
 >  >>  --Q3m66g38MJWwicPqkbVeUfD3OpjiGD205
 >  >>  Content-Type: text/plain; charset=3Dutf-8
 >  >>  Content-Language: en-US
 >  >>  Content-Transfer-Encoding: quoted-printable
 >  >> =20
 >  >>  On 09.10.2017 14:40, Dominik Bialy wrote:
 >  >>  > The following reply was made to PR port-amd64/52596; it has been no=
 >  ted =3D
 >  >>  by GNATS.
 >  >>  >=3D20
 >  >>  > From: Dominik Bialy <dmb%yenn.ulegend.net@localhost>
 >  >>  > To: coypu%sdf.org@localhost
 >  >>  > Cc: Dominik Bialy <dmb%yenn.ulegend.net@localhost>, gnats-bugs%NetBSD.org@localhost
 >  >>  > Subject: Re: port-amd64/52596 (Another netbsd-8 panic)
 >  >>  > Date: Mon, 9 Oct 2017 14:37:47 +0200
 >  >>  >=3D20
 >  >>  >  On Mon, Oct 09, 2017 at 10:13:34AM +0000, coypu%sdf.org@localhost wrote:
 >  >>  >  > On Mon, Oct 09, 2017 at 10:01:47AM +0200, Dominik Bialy wrote:
 >  >>  >  > > Current sysctls are:
 >  >>  >  > >=3D20
 >  >>  >  > > yenn# sysctl machdep.xsave_features
 >  >>  >  > > machdep.xsave_features =3D3D 0
 >  >>  >  > > yenn# sysctl machdep.fpu_save
 >  >>  >  > > machdep.fpu_save =3D3D 1
 >  >>  >  > >=3D20
 >  >>  >  > > I'll try applying the patch today and building the kernel.
 >  >>  >  >=3D20
 >  >>  >  > sorry, I misread the code, it shouldn't make a functional differ=
 >  ence=3D
 >  >> =20
 >  >>  >  > either way.
 >  >>  >  >=3D20
 >  >>  >  > do you have a coredump in /var/crash?
 >  >>  >  > can you:
 >  >>  >  > gunzip netbsd.3.core.gz
 >  >>  >  > gunzip netbsd.3.gz
 >  >>  >  > crash -M netbsd.3.core -N netbsd.3
 >  >>  >  >=3D20
 >  >>  >  > crash> dmesg
 >  >>  >  > (only to confirm it died at the same spot)
 >  >>  >  > crash> examine x86_xsave_features
 >  >>  >  > crash> bt
 >  >>  > =3D20
 >  >>  >  I found one coredump from Sep 23 (sources were
 >  >>  >  dated around Sep 15.)
 >  >>  > =3D20
 >  >>  >  fatal privileged instruction fault in supervisor mode
 >  >>  >  trap type 0 code 0 rip 0xffffffff80224a52 cs 0x8 rflags 0x10016 cr=
 >  2 0x=3D
 >  >>  75ba90c36d60 ilevel 0x8 rsp 0xfffffe804057
 >  >>  >  bea8
 >  >>  >  curlwp 0xfffffe81318c2720 pid 391.2 lowest kstack 0xfffffe80405792=
 >  c0
 >  >>  >  panic: trap
 >  >>  >  cpu1: Begin traceback...
 >  >>  >  vpanic() at netbsd:vpanic+0x140
 >  >>  >  snprintf() at netbsd:snprintf
 >  >>  >  startlwp() at netbsd:startlwp
 >  >>  >  alltraps() at netbsd:alltraps+0x96
 >  >>  >  fpudna() at netbsd:fpudna+0x61
 >  >>  >  cpu1: End traceback...
 >  >>  > =3D20
 >  >>  >  dumping to dev 18,1 (offset=3D3D132519, size=3D3D1032011):
 >  >>  >  dump
 >  >>  >  crash> examine x86_xsave_features
 >  >>  >  x86_xsave_features:     160b78a0
 >  >> =20
 >  >>  Looks like trash..
 >  >> =20
 >  >>  Please try:
 >  >>  examine x86_fpu_save_size
 >  >>  examine x86_fpu_save
 >  >>  examine i386_nocpuid_cpus
 >  >> =20
 >  >>  (checking if the stack has been damaged)
 >  >=20
 >  > yenn# crash -M netbsd.6.core -N netbsd.6
 >  > Crash version 8.0_BETA, image version 8.0_BETA.
 >  > System panicked: trap
 >  > Backtrace from time of crash is available.
 >  > crash> examine x86_fpu_save_size
 >  > x86_fpu_save_size:      200
 >  > crash> examine x86_fpu_save
 >  > x86_fpu_save:   1
 >  > crash> examine i386_nocpuid_cpus
 >  > i386_nocpuid_cpus:      1
 >  > crash>
 >  >=20
 >  
 >  So something is overwrites x86_xsave_features with trash.
 >  
 >  A valid value would like like this:
 >  $ sysctl machdep.xsave_features
 >  
 >  machdep.xsave_features =3D 7
 >  
 >  Unless I miss something, the only place of setting this value is in:
 >  
 >  /src/sys/arch/x86/x86/identcpu.c: cpu_probe_fpu(struct cpu_info *ci)
 >  
 >  x86_xsave_features =3D (uint64_t)descs[3] << 32 | descs[0];
 >  
 >  It would be easier to track it down with a reproducer, with temporary
 >  asserts.. but I expect that we are restricted to reading the code.
 >  
 >  A possible hand-made assert is to put panic() like this:
 >  
 >  	/* Get features and maximum size of the save area */
 >  	x86_cpuid(0xd, descs);
 >  	if (descs[2] > 512)
 >  		x86_fpu_save_size =3D descs[2];
 >  
 >  +       panic("Oops how did we get here!\n");
 >  #ifdef XEN
 >  	/* Don't use xsave, force fxsave with x86_xsave_features =3D 0. */
 >  #else
 >  	x86_xsave_features =3D (uint64_t)descs[3] << 32 | descs[0];
 >  #endif
 >  
 >  Once it will be fired, we will need stacktrace.
 >  
 >  >> =20
 >  >>  >  crash> bt
 >  >>  >  _KERNEL_OPT_NARCNET() at 0
 >  >>  >  _KERNEL_OPT_ACPI_SCANPCI() at _KERNEL_OPT_ACPI_SCANPCI+0x7
 >  >>  >  vpanic() at vpanic+0x149
 >  >>  >  snprintf() at snprintf
 >  >>  >  startlwp() at startlwp
 >  >>  >  calltrap() at calltrap+0x11
 >  >>  >  fpudna() at fpudna+0x61
 >  >>  >  crash>
 >  >>  > =3D20
 >  >>  >=3D20
 >  >> =20
 >  >> =20
 >  >> =20
 >  >>  --Q3m66g38MJWwicPqkbVeUfD3OpjiGD205--
 >  >> =20
 >  >>  --tQv6se3kuwJu5afbaaitkaHLbm3ej6EH0
 >  >>  Content-Type: application/pgp-signature; name=3D"signature.asc"
 >  >>  Content-Description: OpenPGP digital signature
 >  >>  Content-Disposition: attachment; filename=3D"signature.asc"
 >  >> =20
 >  >>  -----BEGIN PGP SIGNATURE-----
 >  >>  Version: GnuPG v2
 >  >> =20
 >  >>  iQIcBAEBCAAGBQJZ23J7AAoJEEuzCOmwLnZsgd4P/14ZY9CY1o2WTGneVG//Ai9U
 >  >>  voBxogER+xwyI+4gChaMwZCixIpQ3fLJnQd89EHOcLWuVTVvFroiWtdFr+uhkSTH
 >  >>  lG6xhAHvVWvv+UX3+BqoRNZVsSQfFWNpbWfUpS+71mKlNkWr/gKIKEOt3bl6+mEG
 >  >>  kTtRlU+vGbaCVv90UYtJMfiTIoBKCSC/EDLTNnpfU7i0Rc+gUEBmaHj1yK5G1l5F
 >  >>  3GxX2yjHW6yTIp9mYrd8Qo4gJ5SHBaTfo0lNxWX5YUKbGYhH5VqeIG/mkSpGlzUr
 >  >>  uiRX3E2YWI7wpuAYDDxeAA9jhPTK0DJWDIGvmL7c3Renj7SdXUN2PVMR+w+cegBQ
 >  >>  k6vPHUYzc5+OMj2azgVt1KGhf01i+PCPsQs1bforCQ1Q1CUO02oZuxRg+O97m2ph
 >  >>  BKECkosmrN8JL3llfi54MI0JEo4mEvhjEswc5pToaMUWYJcEwzSaiMfgjX/eg2EQ
 >  >>  oRumz1kR8pPkKAxHbwmb3G2L6fBo+iYx6RrQuWCXdjTCKaGe6LWbmd177c9rhBj7
 >  >>  5xD8zXLhQ3dnLibtCSq8oKKNCWO2D2eO4v/bJx5I9axVNpOo8DYAz/muBtGiaT9y
 >  >>  VCjDoDCa3+nXu8+WznmXp/iaKVeBkX8atearj4gpJC6xTjyi7HbDKw7Up8TLlKae
 >  >>  isK0kEBa1v1SC7dpb+ZK
 >  >>  =3DylKn
 >  >>  -----END PGP SIGNATURE-----
 >  >> =20
 >  >>  --tQv6se3kuwJu5afbaaitkaHLbm3ej6EH0--
 >  >> =20
 >  >=20
 >  
 >  
 >  
 >  --4WW8gCMGTltqtxfIRjsJGr2vwcT7hbkVT--
 >  
 >  --cidepVPSeqFoUukHPl1uGdk4cLpk2SVRj
 >  Content-Type: application/pgp-signature; name="signature.asc"
 >  Content-Description: OpenPGP digital signature
 >  Content-Disposition: attachment; filename="signature.asc"
 >  
 >  -----BEGIN PGP SIGNATURE-----
 >  Version: GnuPG v2
 >  
 >  iQIcBAEBCAAGBQJZ28wvAAoJEEuzCOmwLnZs7iAP/Auf8ebvPhKUG61FIpZt2x4V
 >  nqCzAyZZr5Y8ejn+FvkbkSICv1D82UnBVeFtlv59vkcBjKsn0rSbL+I0A6Qvv3aL
 >  ba7B0076Ge15jtMUutDg/dFpdKLihhYm5VUYO2ODS1obDbitB4BzvFeSBtcj7DrG
 >  txHAgax4k2Oc8iLCBMEIXP8f/ljSEfhnrSaUrDJIMrRruaex5cEm5FBNUe2Vi5LQ
 >  tB1IAi69D5WlUfw1NKL0UWoaPqxTqrdtwhexvtbhT/OAFbWp/2Dnz7CbmsW3OnMa
 >  wgp69Vt+NqtZMetkT6WzZLasS/uxjaPE8d6XHuffDBWUu5nB31wm9UOk3IHvV9RU
 >  GDO6WgjzBwY/Ps/ukbIBeX88uzGNQkwGDreJRfBVIRXWjdPoBEZZUCea+g2i7FJN
 >  84e/+Yyzh2K9iBu6nAJuoSUWM5AU3pUUZxbeMMO4xWMBlmLw3rrsqAB7xS5Tpf9e
 >  N38ENi2OqKPtYbrPvviVnbaky/ycusL9eeVvn3IBcPcRVOn7rq/cnPNryF61Ij71
 >  q700WBxhM281OSxAt8pDjJhIFSMPTbSRUZM+ySNysZXrVxJwuuEwtoMsQB7XNrLT
 >  Jtp22Yj+zc/uGWiUJ1IKUdS061SVpazYigYL3yLlJ3Yy8WgDEQBAcgATon5Ekfn0
 >  ULG9cB3ItvYy+E2Ku79D
 >  =2ZrA
 >  -----END PGP SIGNATURE-----
 >  
 >  --cidepVPSeqFoUukHPl1uGdk4cLpk2SVRj--
 >  
 



Home | Main Index | Thread Index | Old Index