Subject: Re: A challenge for the list
To: None <port-macppc@netbsd.org>
From: Donald Lee <donlee_ppc@icompute.com>
List: port-macppc
Date: 04/19/2002 21:22:42
>On Fri, Apr 19, 2002 at 06:39:46AM -0500, Donald Lee wrote:
>> Any guesses from the list about what might have been wrong?
>
>Though other suggestions here make sense, this sounds to me as if
>your process table was full. (Thus root couldn't get a shell after
>a su.)

I don't think so.  Although I could not get an su, I could do almost
everything else.  I logged on and off, and tried out the web server, and
a few other services.  Everything seemed normal, except that the
system log stopped abruptly 30 hours earlier, and I could not get su,
nor response from the console.

It was clearly terminal, but in the "early stages".

>> Any clever ideas on how to get a console when su won't work, and
>> the console is not responding? (at all).
>
>Not much to do in that situation.
>
>One option is to add your user account to the operator group. Then
>you could issue a shutdown -r now. (Which, subsequently, wouldn't
>work for the same reason that su wouldn't. Make you feel better?
>;^>)

I think you might make a lousy guide, but a good companion. ;->

>> I'd send_pr it, but I don't have much useful information to provide.
>> I don't *really* even have a good description of what was wrong.
>
>Might have been nice to check your kernel log. (You know, the output
>of dmesg.) Does mac68k keep that between boots? If so, then you
>might still be able to find it. (/var/run/dmesg is probably useless,
>as it's probably just the messages from this boot.)

Interestingly enough, I've spent some time reading the tea leaves -
all the logs that I could find.  The ones that needed root access I obviously
could not access until after the reboot, and I did check the dmesg before
the reboot, and there was nothing clueful there.  HOWEVER, one very strange
thing was the following excerpt from the /var/log/messages file:

>Apr 17 10:44:08 mercy pppd[1114]: Serial link disconnected.
>Apr 17 10:44:09 mercy pppd[1114]: Exit.
>Apr 17 10:44:10 mercy pppd[1176]: pppd 2.4+.0dgl.0 started by root, uid 0
>Apr 17 10:54:56 mercy /netbsd: cy0: Do manual int svc! icount 0xb7c0
>Apr 17 11:09:26 mercy pppd[1176]: Using interface ppp1
>Apr 17 11:09:26 mercy pppd[1176]: Connect: ppp1 <--> /dev/ttyCY01
>Apr 17 11:09:29 mercy pppd[1176]: user bllc logged in
>Apr 17 11:09:30 mercy pppd[1176]: found interface epic0 for proxy arp
>Apr 17 11:09:30 mercy pppd[1176]: local  IP address 209.46.8.67
>Apr 17 11:09:30 mercy pppd[1176]: remote IP address 209.46.8.92
>Apr 17 11:09:35 mercy /netbsd: cy0: Do manual int svc! icount 0xb800
>Apr 17 11:30:16 mercy pppd[1176]: LCP terminated by peer (^@^@^@^@^@^@)
>Apr 17 11:30:19 mercy pppd[1176]: Connection terminated.
>Apr 17 11:30:19 mercy pppd[1176]: Connect time 20.9 minutes.
>Apr 17 11:30:19 mercy pppd[1176]: Sent 602792 bytes, received 43833 bytes.
>Apr 17 11:30:27 mercy pppd[1176]: Serial link disconnected.
>Apr 17 11:30:28 mercy pppd[1176]: Exit.
>Apr 17 11:30:28 mercy pppd[1224]: pppd 2.4+.0dgl.0 started by root, uid 0
>Apr 19 06:15:39 mercy syslogd: restart
>Apr 19 06:15:40 mercy /netbsd: cy0: Do manual int svc! icount 0xb880
>Apr 19 06:15:40 mercy /netbsd: cy0: Do manual int svc! icount 0xb8c0
>Apr 19 06:15:40 mercy /netbsd: cy0: Do manual int svc! icount 0xb900
>Apr 19 06:15:40 mercy /netbsd: cy0: Do manual int svc! icount 0xb940
>Apr 19 06:15:40 mercy /netbsd: cy0: Do manual int svc! icount 0xb980
>Apr 19 06:15:40 mercy /netbsd: cy0: Do manual int svc! icount 0xb9c0
>Apr 19 06:15:40 mercy /netbsd: cy0: Do manual int svc! icount 0xba00
>Apr 19 06:15:40 mercy /netbsd: NetBSD 1.5.2 (try2) #3: Wed Sep 19 08:32:41 CDT 2001
>Apr 19 06:15:40 mercy /netbsd:     donlee@mercy:/usr/src.new/sys/arch/macppc/compile/try2
>Apr 19 06:15:40 mercy /netbsd: CPU: 750 (Revision 202)
>Apr 19 06:15:40 mercy /netbsd: total memory = 98304 KB
>Apr 19 06:15:40 mercy /netbsd: avail memory = 85280 KB
>Apr 19 06:15:40 mercy /netbsd: using 1254 buffers containing 5016 KB of memory
>Apr 19 06:15:41 mercy /netbsd: mainbus0 (root)
>Apr 19 06:15:41 mercy /netbsd: cpu0 at mainbus0: 512KB backside cache
>Apr 19 06:15:41 mercy /netbsd: bandit0 at mainbus0
>Apr 19 06:15:41 mercy /netbsd: pci0 at bandit0 bus 0
>Apr 19 06:15:41 mercy /netbsd: pci0: i/o space, memory space enabled
>Apr 19 06:15:41 mercy /netbsd: pchb0 at pci0 dev 11 function 0
>Apr 19 06:15:41 mercy /netbsd: pchb0: Apple Computer Bandit Host-PCI Bridge (rev. 0x03)
>Apr 19 06:15:41 mercy /netbsd: epic0 at pci0 dev 13 function 0: SMC 83c170 Fast Ethernet, rev. 9
>Apr 19 06:15:41 mercy /netbsd: epic0: interrupting at irq 23
>Apr 19 06:15:41 mercy /netbsd: epic0: SMC9432TX_1, Ethernet address 00:e0:29:9e:93:32
>Apr 19 06:15:41 mercy /netbsd: ukphy0 at epic0 phy 3: Generic IEEE 802.3u media interface
>Apr 19 06:15:41 mercy /netbsd: ukphy0: OUI 0x000895, model 0x0021, rev. 11
>Apr 19 06:15:41 mercy /netbsd: ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
>Apr 19 06:15:41 mercy /netbsd: cy0 at pci0 dev 15 function 0: interrupting at ir

Besides the jump in times from the 17th to the 19th, notice the messages about "Do manual int svc".
These are kernel messages that are generated by the Cyclades driver that I modified.  The
number is a counter of "missed" interrupts since boot.  I normally get several of these
a day.  Note that this counter is a compiled *static*, which gets re-initialized to zero on
boot.  Note that on reboot these appeared between the restart and the kernel banner.

I did not restart this machine gently.  I hit the HW reset button.  How did these values
survive to come out after the reboot???

-dgl-