Port-dreamcast archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Occasional wedging?



This is with 4.0.1, so, if you don't recall anything that old, feel
free to skip this.

My Dreamcast is running diskless, root on rtk0 console on scif0, doing
a build of the world.  But every few days, it seems to wedge - the log
stops getting appended to, .o files stop appearing, etc.  But, as soon
as I issue a command on the serial line, it wakes up just fine.

The curious thing is, it's keeping time, but I'm not seeing "nfs
server...not responding", and a loop I started to (a) keep the clock
somewhat disciplined and (b) try to work around the wedging doesn't run
either.

I started

# cd /usr/src
# sh ./build/sh -D /root/dreamcast/DESTDIR -O /root/dreamcast/OBJDIR -x -U build > /root/dreamcast/zbuild 2>&1 &
# while sleep 3600; do ntpdate 10.0.1.1; done &

(10.0.1.1 is the NTP server on that subnet, of course.)

I started this last Friday, and today it got stuck.  On the NFS server,
I have a tail +0f running on the logfile, piped into a program which
gives me timestamps.  The log got stuck at 08:52:07.35.  So I typed
"date" to the serial console, and here's a cut-and-paste of what I saw
(the first four lines had appeared before I typed anything):

17 Aug 09:42:08 ntpdate[16801]: step time server 10.0.1.1 offset 1.803118 sec
17 Aug 10:42:13 ntpdate[26959]: step time server 10.0.1.1 offset 1.797958 sec
17 Aug 11:42:19 ntpdate[26645]: step time server 10.0.1.1 offset 1.807759 sec
17 Aug 12:42:24 ntpdate[158]: step time server 10.0.1.1 offset 1.803505 sec
date
Mon Aug 17 14:09:28 UTC 2015
Dreamcast# 17 Aug 14:09:32 ntpdate[11419]: step time server 10.0.1.1 offset 2.617056 sec

which leads me to the observations above: it's keeping time (the time
was correct within the about-1.8-sec/hr drift of the clock - the
Dreamcast is running in UTC, but local time is UTC-0400), but both the
build of the world and the shell loop had wedged.  But it was
responsive enough to accept and run the date command - and something
about doing that woke it back up again.  The build log picked up where
it left off and the once-an-hour ntpdate picked up; since then I've
seen

17 Aug 15:09:37 ntpdate[8563]: step time server 10.0.1.1 offset 1.810756 sec

leading me to think it has just phase-shifted from xx:42 to xx:09 -
exactly what I'd expect from all of userland wedging and then waking up
when I ran the date command.

It's not just the NFS server going away.  If I do that artificially
(eg, by pulling the NFS server's network cable), I get the expected

nfs server 10.0.1.2:/nfs/dreamcast: not responding

and then, on replugging,

nfs server 10.0.1.2:/nfs/dreamcast: is alive again

But I don't see those when it wedges, so I don't think it's just that.

Any idea what could be behind this?  If not, anything I could do to
help figure out what's going wrong?

					Mouse


Home | Main Index | Thread Index | Old Index