Subject: AS1200 SMP instability
To: None <port-alpha@netbsd.org>
From: David Hopper <dhop@nwlink.com>
List: port-alpha
Date: 02/05/2002 11:21:40
I've narrowed the search down for the source of the instability on this
tincup platform.  By removing the multiprocessor code from the kernel, I am
able to survive full builds now without the disk crashes that I had
mentioned earlier on the list.

The instability dates all the way back to when the multiprocessor code was
enabled (which makes a lot of sense now that I think of it-- I ran SMP
since the changes were committed).  In other words, my kernel right now is
1.5ZA (Jan 26), but the errors have been going back all the way through
1.5Y (November) and beyond.

Since going single-processor, I'm solid.

The first clue that the MP code was the culprit was this halt I received
yesterday on cc1plus during a build; it's different than the other debugger
halts outlined previously:

db{0}> show registers
v0	0x6
t0	0x1
t1	0x1
t2	0x10001	rn+0xffe1
t3	0xf423f	rn+0xf421f
t4	0xfffffc000062d798	lasttime.132
t5	0xfffffc00005e0168	microtime_slock.133
t6	0xfffffc0004df7640	end+0x47aefe8
t7	0xfffffc0004aaf14c	end+0x4466af4
s0	0x4
s1	0x8	rettmp
s2	0x102	rn+0xe2
s3	0x200086d20000
s4	0x200099122000
s5	0x31	rn+0x11
s6	0x12029b91c
a0	0x6
a1	0x1
a2	0x199	rn+0x179
a3	0
a4	0
a5	0
t8	0x1e	framesz+0xe
t9	0xfffffc00004f7d30	microtime+0xb0
t10	0x1116faa6ca6c0
t11	0x1fc1e058
ra	0xfffffe0023eb1688
t12	0xfffffc0000392280	spinlock_acquire_count
at	0xfffffc00005e0690	sched_whichqs
gp	0xfffffc00005d4fb8	special_symbols+0x8160
sp	0xfffffe0023eb1550
pc	0xfffffe0023eb168c
ps	0x6
ai	0x1fc1e058
pv	0xfffffc0000392280	spinlock_acquire_count
0xfffffe0023eb168c:	call_pal halt

db{0}> ps

PID	PPID	PGRP	UID	S	FLAGS	COMMAND	WAIT
>22799	22797	22796	0	7	0X84006	cc1plus
22797	22796	22796	0	3	0x84086	c++	wait
22796	22790	22796	0	3	0x84086	sh	wait
22790	22782	22233	0	3	0x84086	nbmake	select
22782	22237	22233	0	3	0x84086	sh	wait
22237	222333	22233	0	3	0x84086	nbmake	wait
22233	22232	22233	0	3	0x84086	sh	wait
22232	22231	241	0	3	0x84086	nbmake	select
22231	21324	241	0	3	0x84086	sh	wait
21324	21323	241	0	3	0x84086	nbmake	wait
21323	21320	241	0	3	0x84086	sh	wait
21320	21319	241	0	3	0x84086	nbmake	wait
21319	20599	241	0	3	0x84086	sh	wait
20599	20598	241	0	3	0x84086	nbmake	wait
20598	20597	241	0	3	0x84086	sh	wait
20597	20596	241	0	3	0x84086	nbmake	wait
20596	1102	241	0	3	0x84086	sh	wait
1102	241	241	0	3	0x84086	nbmake	wait
241	237	241	0	3	0x84086	sh	wait
237	234	237	0	3	0x84086	tcsh	pause
234	210	234	0	3	0x84086	csh	pause
228	224	228	0	3	0x84086	tcsh	ttyin
224	212	224	0	3	0x84086	csh	pause
212	211	212	150	3	0x84086	tcsh	pause
211	187	187	0	3	0x80084	sshd	select
210	1	210	150	3	0X84086	tcsh	pause
209	178	178	32767	3	0x80184	httpd	lockf
208	178	178	32767	3	0x80184	httpd	lockf
206	1	206	0	3	0x80084	cron	nanosle
205	178	178	32767	3	0x80184	httpd	lockf
204	178	178	32767	3	0x80184	httpd	lockf
203	178	178	32767	3	0x80184	httpd	select
198	1	198	0	3	0x80084	inetd	select
190	1	190	0	3	0x80084	sendmail	select
187	1	187	0	3	0x80084	sshd	select
178	1	178	0	3	0x80085	httpd	select
177	160	8	1000	3	0x84186	mysqld	select
160	1	8	0	3	0x84086	sh	wait
145	1	145	0	3	0x80084	ntpd	pause
76	1	76	0	3	0x80084	syslogd	select
14	0	0	0	7	0xa0204	raid
7	0	0	0	3	0xa0204	aiodoned	aiodone
6	0	0	0	3	0xa0204	ioflush	drainvp
5	0	0	0	3	0x20204	reaper	reaper
4	0	0	0	3	0xa0204 pagedaemon	pgdaemo
3	0	0	0	3	0xa0204	isp0:0	sccomp
2	0	0	0	3	0xa0204	siop0:0	sccomp
1	0	1	0	3	0x84084	init	wait
0	-1	0	0	3	0xa0204	swapper	schedul

spinlock_acquire_count: 27bb0024