port-hp300: Re: yet another hit of the hp-forgets-how-to-fork

Subject: Re: yet another hit of the hp-forgets-how-to-fork
To: None <downsj@csos.orst.edu>
From: Jason Thorpe <thorpej@cs.orst.edu>
List: port-hp300
Date: 10/23/1994 10:38:32
It may be interesting to note:

I have a 380 with 32 megs of memory, root/swap/usr/ccd on hp-ib, too.  It 
occasionally has this problem, along with `panic: enter: out of address 
space'.  However, my 433 with 32meg of memory, root/swap/everything on 
SCSI *never* does this.  Neither does my other 380, 24megs of memory, all 
SCSI.

To be quite honest, I was beginning to think that the `freezing' effect 
was the result of flaky hp-ib drives, when the controller does not ack 
the command (downsj and I one-line-hacked autoconf to probe a non-responsive 
disk twice to circumvent the problem at boot).  However, I'm wondering if it 
might be a quirk in the hp-ib code, since my SCSI machines do not have 
this problem (and are often busier than the hp-ib ones...)

On Sun, 23 Oct 1994 01:18:32 -0700 
 Jason Downs <downsj@CSOS.ORST.EDU> wrote:

 > 
 > >Submitter-Id:	net
 > >Originator:	
 > >Organization:
 > Computer Science Outreach Services, Oregon State University
 > >Confidential:	no
 > >Synopsis:	Yet another hit on the very old HP forgets how to fork bug
 > >Severity:	critical
 > >Priority:	high
 > >Category:	port-hp300
 > >Class:		sw-bug
 > >Release:	NetBSD 1.0A-Current (about a month old)
 > >Environment:	GCC 2.5.8
 > System: NetBSD nemesis 1.0_BETA NetBSD 1.0_BETA (NEMESIS) #20: Sun Oct 16 21:14:56 PDT 1994 root@:/usr/src/sys/arch/hp300/compile/NEMESIS hp300
 > 
 > 
 > >Description:
 > 	It would seem after some time of not bothering me or my machines,
 > 	the good old 'I can't fork any longer' bug has struck again, just now.
 > 
 > 	The machine is a 33Mhz/32Meg 380. It is my file server, running off
 > 	of HPIB.
 > 
 > 	While doing a rather massive find job over the source drive, the
 > 	machine hung while trying to fork/exec new processes. This is
 > 	similar to what happens when a machine runs out of swap, but
 > 	it *isn't*. The machine *rarely* swaps at all, and it certainly
 > 	wasn't doing so then-- and besides, it has 180megs of swap space.
 > 
 > 	This is identical to the problem I've been reporting off and on
 > 	for nearly a year.
 > 
 > 	What's interesting in this case is that up until a week ago, this
 > 	machine was a 370, and the problem hadn't surfaced in quite a long
 > 	time.
 > 
 > 	Perhaps some useful information:
 > NetBSD 1.0_BETA (NEMESIS) #20: Sun Oct 16 21:14:56 PDT 1994
 >     root@:/usr/src/sys/arch/hp300/compile/NEMESIS
 > HP9000/380/425 (25MHz MC68040 CPU+MMU+FPU, 4k on-chip physical I/D caches)
 > real mem = 33546240
 > avail mem = 27475968
 > using 819 buffers containing 3354624 bytes of memory
 > dma: 98620C with 2 channels, 32 bit DMA
 > hpib0 at sc7, ipl 3
 > ct0: 9144 streaming tape
 > ct0 at hpib0, slave 2
 > dca0 at sc9, ipl 5, flags 0x1
 > dca1 at sc11, ipl 5
 > scsi0: 32 bit dma, async, scsi id 7
 > scsi0 at sc12, ipl 4
 > dcm0 at sc13, ipl 3, flags 0xe
 > hpib1 at sc14, ipl 4
 > rd0: 7959B
 > rd0 at hpib1, slave 0
 > rd1: 7959B
 > rd1 at hpib1, slave 1
 > rd2: 7959B
 > rd2 at hpib1, slave 2
 > hpib2 at sc15, ipl 4
 > rd3: 7958A
 > rd3 at hpib2, slave 0
 > rd4: 7958A
 > rd4 at hpib2, slave 1
 > rd5: 7958A
 > rd5 at hpib2, slave 2
 > rd6: 7958A
 > rd6 at hpib2, slave 3
 > le0: hardware address 08:00:09:06:a8:60
 > le0 at sc21, ipl 5
 > dcm1 at sc28, ipl 3, flags 0xe
 > ccd0: 4 components (rd3h, rd4h, rd5h, rd6h), 1015808 blocks interleaved at 8192 blocks
 > ccd0 configured
 > 
 > machine		"hp300"
 > cpu		"HP370"
 > cpu		"HP380"
 > ident		NEMESIS
 > options		FPSP
 > 
 > timezone	7 dst
 > maxusers	32
 > 
 > # Standard options
 > options		SWAPPAGER,VNODEPAGER,DEVPAGER
 > options		INET
 > options		FFS
 > options		FIFO
 > options		MFS
 > options		KERNFS
 > options		FDESC
 > options		UNION
 > options		NFSSERVER
 > options		NFSCLIENT
 > options		PROCFS
 > options		"CD9660"
 > options		"COMPAT_NOMID"
 > options		"COMPAT_43"
 > options		"TCP_COMPAT_42"
 > options		"COMPAT_44"
 > 
 > # Options for all HP machines
 > options		SYSVSHM
 > options		SYSVSEM
 > options		SYSVMSG
 > 
 > # Options specific to this host.
 > #options		DDB
 > #options		DEBUG,DIAGNOSTIC
 > #options		PANICBUTTON,PANICWAIT
 > options		KTRACE
 > options		"NKMEMCLUSTERS=1024"
 > options		"HILVID=1"
 > options		PROFTIMER,"PRF_INTERVAL=500"
 > #options 	KGDB,"KGDBDEV=15*256+2","KGDBRATE=19200"
 options		SPAM
 > options		USELEDS
 > 
 > config		netbsd root on rd0 swap on rd0b and rd1b
 > 
 > master		hpib0	at scode7
 > master		hpib1	at scode14
 > master		hpib2	at scode15
 > master		hpib3	at scode16
 > master		hpib4	at scode?
 > master		hpib5	at scode?
 > disk		rd0	at hpib1 slave 0
 > disk		rd1	at hpib1 slave 1
 > disk		rd2	at hpib1 slave 2
 > disk		rd3	at hpib2 slave 0
 > disk		rd4	at hpib2 slave 1
 > disk		rd5	at hpib2 slave 2
 > disk		rd6	at hpib2 slave 3
 > disk		rd7	at hpib3 slave ?
 > disk		rd8	at hpib3 slave ?
 > disk		rd9	at hpib3 slave ?
 > disk		rd10	at hpib? slave ?
 > disk		rd11	at hpib? slave ?
 > disk		rd12	at hpib? slave ?
 > tape		ct0	at hpib0 slave ?
 > tape		ct1	at hpib0 slave ?
 > 
 > master		scsi0	at scode?
 > master		scsi1	at scode?
 > disk		sd0	at scsi? slave ?
 > disk		sd1	at scsi? slave ?
 > disk		sd2	at scsi? slave ?
 > disk		sd3	at scsi? slave ?
 > disk		sd4	at scsi? slave ?
 > disk		sd5	at scsi? slave ?
 > disk		sd6	at scsi? slave ?
 > tape		st0	at scsi? slave ?
 > tape		st1	at scsi? slave ?
 > 
 > device		le0	at scode?
 > device		le1	at scode?
 > device		dca0	at scode9 flags 1
 > device		dca1	at scode?
 > device          dcm0    at scode? flags 0xe
 > device          dcm1    at scode? flags 0xe
 > 
 > #pseudo-device	sl	1
 > pseudo-device	bpfilter 16
 > pseudo-device	pty	80
 > pseudo-device	loop
 > pseudo-device	ether
 > pseudo-device   ccd0 on rd3h and rd4h and rd5h and rd6h interleave 8192
 > 
 > >How-To-Repeat:
 > 	Cause the wind to blow in the proper direction, while the moon is in
 > 	the proper phase.
 > >Fix:
 > 	Usually, reducing maxusers will avoid the problem, but it doesn't
 > 	seem to be the case any longer.

--------------------------------------------------------------------------
Jason R. Thorpe               thorpej@cs.orst.edu                 758-2003
Systems Administrator            CSWest Room 5                    737-5567
CS Dept, Oregon State University           http://www.cs.orst.edu/~thorpej
               "I brought my BOWLING BALL -- and some DRUGS!"
                      -- ztp