Subject: netbsd-1-6 (@2003/08/10) frozen solid on 2-CPU AS4000, sitting at DDB now
To: NetBSD/alpha Discussion List <port-alpha@NetBSD.ORG>
From: Greg A. Woods <woods@weird.com>
List: port-alpha
Date: 08/13/2003 13:27:12
I had done a fair bit of mucking about with my new as4000 since
upgrading it with a second CPU and some more RAM, but today as I started
a full rebuild of /usr/xsrc (over NFS) and it froze solid not long after
it started.  No pings, and no echo on the serial console.

Now the first big problem is that I can't get it to drop into the
debugger when I send it a BREAK on the serial console.  However I can of
course halt the CPU from the front panel and/or from the RCM and
continuing from there gets me to the debugger of course.

	RCM>halt
	
	Focus returned to COM port
	
	halted CPU 0
	CPU 1 is not halted
	
	halt code = 1
	operator initiated halt
	PC = fffffc00004e1e48
	P00>>>cont
	
	continuing CPU 0
	CP - RESTORE_TERM routine to be called
	panic: user requested console halt
	Stopped in pid 14563 (as) at    cpu_Debugger+0x4:       ret     zero,(ra)
	db{0}> trace
	cpu_Debugger() at cpu_Debugger+0x4
	panic() at panic+0x168
	console_restart() at console_restart+0x74
	XentRestart() at XentRestart+0x90
	--- console restart (from ipl 4) ---
	schedclock() at schedclock+0x88
	interrupt() at interrupt+0x1c0
	XentInt() at XentInt+0x1c
	--- interrupt ---
	pool_cache_put() at pool_cache_put+0x20
	m_freem() at m_freem+0x178
	udp_input() at udp_input+0x698
	ip_input() at ip_input+0xde0
	ipintr() at ipintr+0xb4
	netintr() at netintr+0xa0
	softintr_dispatch() at softintr_dispatch+0x134
	esigcode() at esigcode+0x78
	prologue botch: displacement 216
	--- root of call graph ---
	db{0}> 

Now, what do I do with CPU#1?  There's no "prom" command in ddb so I
can't jump back to the SRM and halt it, though I suppose I could try
using RCM to halt CPU#0 again....

BTW, I find it really annoying that a console halt triggers a panic().
Is there really no way to continue the OS from DDB on alpha?

Unfortunately since upgrading the RAM I no longer have enough space on
my current dump partition to leave a system core dump.

I'll keep it sitting at DDB for an hour or so in case someone has any
suggestions for gathering further information of use in debugging this
freeze....

BTW, 'top', which happened to be running at the time showed:

load averages:  3.35,  3.57,  2.76                                     11:27:15
48 processes:  46 sleeping, 2 on processor
CPU states: 51.0% user,  0.0% nice,  0.5% system,  0.0% interrupt, 48.5% idle
Memory: 194M Act, 2016K Wired, 164M File, 1179M Free
Swap: 1024M Total, 1024M Free

  PID USERNAME PRI NICE   SIZE   RES STATE      TIME   WCPU    CPU COMMAND
14562 woods     64    0  7568K 7536K CPU/1      0:03 92.79% 12.94% cc1
14563 woods     -6    0  2320K 2112K piperd/0   0:00  1.40%  0.20% as
    8 root      18    0     0K  103M syncer/0   2:41  0.00%  0.00% [ioflush]
    7 root     -18    0     0K  103M reaper/0   0:34  0.00%  0.00% [reaper]
  133 root      10    0     0K  103M nfsidl/1   0:24  0.00%  0.00% [nfsio]
  211 root      18  -12  1304K 1320K pause/0    0:15  0.00%  0.00% ntpd
  132 root      10    0     0K  103M nfsidl/1   0:14  0.00%  0.00% [nfsio]
  592 root       2    0   656K  544K select/0   0:09  0.00%  0.00% rlogind
  249 root       2    0   656K  536K select/1   0:05  0.00%  0.00% rlogind
  134 root      10    0     0K  103M nfsidl/1   0:04  0.00%  0.00% [nfsio]
  106 root      10    0  1480K  256K nanosl/0   0:04  0.00%  0.00% ipmon
    9 root     -18    0     0K  103M aiodon/0   0:04  0.00%  0.00% [aiodoned]
29837 woods     28    0   544K  512K CPU/0      0:02  0.00%  0.00% top
 6791 woods     18    0   888K  840K pause/0    0:02  0.00%  0.00% ksh
  135 root      10    0     0K  103M nfsidl/0   0:02  0.00%  0.00% [nfsio]
 9322 woods     10    0  5712K 5632K wait/0     0:02  0.00%  0.00% make
  355 root       2    0   656K  544K select/0   0:02  0.00%  0.00% rlogind
  381 woods     18    0   856K  808K pause/1    0:01  0.00%  0.00% ksh
  130 root      10    0   696K 3624K mfsidl/1   0:01  0.00%  0.00% mount_mfs
  246 root      10    0   568K  392K nanosl/1   0:01  0.00%  0.00% cron
   99 root       2    0   576K  440K select/1   0:01  0.00%  0.00% syslogd
  594 woods     18    0   848K  800K pause/0    0:00  0.00%  0.00% ksh
  251 woods     18    0   848K  776K pause/1    0:00  0.00%  0.00% ksh
    4 root      10    0     0K  103M mlxzzz/0   0:00  0.00%  0.00% [mlxtask]
    5 root      10    0     0K  103M pmsres/0   0:00  0.00%  0.00% [pms0]
 8745 woods     10    0   704K  624K wait/0     0:00  0.00%  0.00% make
 8746 woods     10    0   640K  616K wait/1     0:00  0.00%  0.00% make
 8804 woods     10    0   632K  608K wait/1     0:00  0.00%  0.00% make
29742 woods     10    0   680K  600K wait/1     0:00  0.00%  0.00% make
29725 woods     10    0   616K  592K wait/0     0:00  0.00%  0.00% make
 8805 woods     10    0   632K  544K wait/0     0:00  0.00%  0.00% sh
 8747 woods     10    0   632K  544K wait/1     0:00  0.00%  0.00% sh
14559 woods     10    0   624K  536K wait/1     0:00  0.00%  0.00% sh
29741 woods     10    0   616K  528K wait/1     0:00  0.00%  0.00% sh


-- 
						Greg A. Woods

+1 416 218-0098                  VE3TCP            RoboHack <woods@robohack.ca>
Planix, Inc. <woods@planix.com>          Secrets of the Weird <woods@weird.com>