Port-sparc64 archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[7.99.12] tda0 issue was: Ultrasparc III+ kernel panic



Eduardo Horvath a écrit :
On Mon, 13 Apr 2015, BERTRAND Joël wrote:

	I have seen. And I have seen another panic :

panic: cpu1: ipi_send: couldn't send ipi to UPAID 0 (tried 10000 times)
cpu1: Begin traceback...
cpu1: End traceback...
Frame pointer is at 0x2004e41
Call traceback:
  netbsd:cpu_reboot+0x208(182f828, 1, ffff, 77bb78, 1cce380, 1c97000) fp =
2004f01
  netbsd:vpanic+0x178(104, 0, 1852638, 1cb6800, f, 1c70740) fp = 2004fb1
  netbsd:panic+0x24(1852638, 20059a8, 1cdc800, 1cddaf8, 1cddc00, 104) fp =
2005061
  netbsd:sparc64_send_ipi_sun4u+0x1ac(1852638, 1, 0, 2710, fffffffffffffffe, 0)
fp = 2005121
  netbsd:cpu_need_resched+0x54(f4240, 1018a80, 0, 0, 70, 0) fp = 20051d1
  netbsd:sched_changepri+0x64(2014000, 2, 2014000, 101db1d08, 101db1040, 2a) fp
= 2005281
  netbsd:resetpriority+0x90(1043816c0, 2a, 0, 1, 101daec40, 101daedc0) fp =
2005331
  netbsd:sched_pstats+0x118(1043816c0, 0, 1c70868, 0, 10caf5510, 2a) fp =
20053e1
  netbsd:uvm_scheduler+0x60(64, 1c71000, 0, 101daedc0, 10caf5510, 1043816c0) fp
= 2005491
  netbsd:main+0x83c(101d89f00, 1c70740, 1c70740, 101da2c80, 1c0a1fc, 18a0598)
fp = 2005541
  netbsd:cpu_initialize+0x154(184d500, 10624dd3, 1c97800, 0, 101daee00, 1) fp =
2005621
  netbsd:100030+0(f0059840, 113800, 113c00, 111880, 111ce8, 1117f8) fp =
fff33651

dumping to dev 25,1 offset 12291071

But I don't understand. With the same kernel, this Blade2000 rebooted one or
more times _by day_ and now, uptime is greater than 8 days. I have saved
kernel image and core if you want.

Well that's not terribly useful.

One CPU tried to tell another CPU something but the other CPU did not
respond.  It then paniced.  In this circumstance the interesting info is
the state of the unresponsive CPU.  An SIR would be much more useful in
this circumstance than a panic.

	Hello,

Some good news. Before patching locore.s with your suggestions, I have rebuilt a 7.99.9 kernel from sources (with userland) and I have planned to investigate last saturday. This kernel 7.99.9 is stable on my blade 2000. I have obtained an uptime greater than 6 days (and system has finally crashed when I have tried to do /etc/rc.d/altqd restart... but it is not the same issue). With 7.99.6, same condition, same blade 2000 paniced one or two times by day. I haven't seen any modification in sparc64/sparc64 nor sparc64/dev that can explain that 7.99.9 is stable and that 7.99.6 wasn't.

Thus, I have rebuilt a 7.99.12 from sources and tda.c seems to be broken. In dmesg, tda.c writes :

tda0: skipping temp adjustment - no sensor values
tda0: skipping temp adjustment - no sensor values
tda0: skipping temp adjustment - no sensor values
tda0: skipping temp adjustment - no sensor values
tda0: skipping temp adjustment - no sensor values
tda0: skipping temp adjustment - no sensor values
tda0: skipping temp adjustment - no sensor values
tda0: skipping temp adjustment - no sensor values
tda0: skipping temp adjustment - no sensor values
tda0: skipping temp adjustment - no sensor values
tda0: skipping temp adjustment - no sensor values
tda0: skipping temp adjustment - no sensor values

and envstat only returns :
envstat: no drivers registered

but fans do not run at maximal speed.

	Best regards,

	JKB



Home | Main Index | Thread Index | Old Index