Subject: RE: cpu_reboot
To: 'Bill Sommerfeld' <sommerfeld@orchard.arlington.ma.us>
From: Mark Randelhoff <markr@cat.co.za>
List: port-i386
Date: 02/22/2000 08:19:45
> > I am trying to correctly reboot a system from within
> > the kernel, after a watchdog fails.
> >
> > The watchdog code is implemented with the
> > timeout((void*)function name,context,time)
> > which recursively calls the watchdog function. When
> > the watchdog times out, the timer is not reenabled
> > and the cpu_reboot function is called.
>
> cpu_reboot() needs to run in process context so it can block waiting
> for i/o to complete, while timeout functions run in interrupt context.
> You can't call tsleep in an interrupt routine.
>
> You have a few options, depending on what sort of failure you're
> attempting to protect against..
>
>  - On x86, cpu_reset() should work in an interrupt routine, but it
> won't sync out the buffer cache and cleanly unmount the filesystem.

The code needs to run on an x86 and an arm system (preferably) and really
needs to cleanly unmount the files systems.

>  - you can spawn a kernel thread (look at sys/kern/kern_kthread.c),
> and have it tsleep() waiting for a wakeup() from your watchdog timeout
> routine, and have your timeout function wake it up; this can call
> cpu_reboot() for a clean shutdown; however, if the system is stuck
> looping at interrupt level, it will never get a chance to run.

I really would like to spawn the thread in the attach of the device driver.
However, the kernel panics with a kernel page fault trap, ( code =0,
stopped in _fork1 + 0xf). When I run the kthread_create from the watchdog
device drivers open function then everything seems OK. The kthread_create
uses
proc0 as the first parameter to the fork1 function and I guess this is
undefined
when the attach is called but OK when the open is called. Would you mind
confirming
this? Can you suggest a way around this?

> The truly paranoid would probably use a hardware watchdog timer..

There is a hardware watchdog but just before it kicks in (assuming
I still have some control over the processor) I would like to
cleanly exit the processes that have not died and caused
the watchdog alarm.

Thanks & kind regards
Mark