Subject: port-sparc64/20675: cron dies at startup
To: None <gnats-bugs@gnats.netbsd.org>
From: None <martin@duskware.de>
List: netbsd-bugs
Date: 03/12/2003 22:18:48
>Number:         20675
>Category:       port-sparc64
>Synopsis:       cron dies at startup
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    port-sparc64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Mar 12 13:19:01 PST 2003
>Closed-Date:
>Last-Modified:
>Originator:     Martin Husemann
>Release:        NetBSD 1.6P
>Organization:
>Environment:
System: NetBSD night-porter.duskware.de 1.6P NetBSD 1.6P (PORTER) #0: Sat Mar 8 09:11:04 CET 2003 martin@insomnia.duskware.de:/usr/src/sys/arch/i386/compile/PORTER i386
Architecture: i386
Machine: i386
>Description:

On sparc64 machines cron sometimes dies. Most lossage went away with the
merge of the SA branch, but sometimes when starting cron it dies right
away.

>How-To-Repeat:

Cron calls daemon(3) - which fails, and cron takes the emergancy exit.
Instrumentation shows that it's fork(2) failing that makes daemon fail.
Interestingly errno is 0.

I verified that sys_fork indeed returned 0 and the two register_t values
returned (rval[0] and rval[1], which then go into the trap frame tf_out[0]
and tf_out[1]) are ok (rval[0] is not negative and rval[1] is 0).

There seems to be some timing issues involved here, since printing the rval
values makes the problem go away reliably.

This sounds like some trap handler bug, as if the trap frame tf_out values
are overwritten by something else due to nested traps or don't make it
back to userland reliably when returning to user mode. Or some interaction
with context switch (which might explain why this only seem to happen
in fork(2), not any other syscall).

>Fix:
no idea yet.
>Release-Note:
>Audit-Trail:
>Unformatted: