port-alpha: clock drifts under certain conditions

Subject: clock drifts under certain conditions
To: None <port-alpha@netbsd.org>
From: Tobias Nygren <tnn@netilium.org>
List: port-alpha
Date: 11/22/2005 17:58:30

Heads up, alpha hackers :)

I've noticed a strange clock drift problem on my machines(as4100
and pc164sx). When I leave the box idle I get an ntp frequency
error of typically less than 10ppm as reported by ntptime(8).
But when applying certain types of load to the system the
frequency error rockets to 512 ppm!

For debugging purposes I run with only one processor to rule out
any problems in the ipi code. (This is on -current)

Here are my discoveries so far:

/* hogcache.c(build with -O0) */
#include <malloc.h>
int main(void) {
        int i;
        char *c=malloc(1024*4096);
        for(;;) for(i=0;i<1024*4096;i++) c[i]=i;
}

Still works fine with 3 of them running,
so no problem with TLB or task switching.

/* hogsyscall.c */
#include <unistd.h>
int main(void) {
        for(;;) gethostname(0,0); // dummy syscall
}

No problem here, syscalls are ok.
System spends 98% time in kernel.

/* badcode.c */
#include <fcntl.h>
#include <unistd.h>
#include <malloc.h>
int main(void) {
        char *buf;
        int fd;
        fd=open("/dev/zero", O_RDONLY, 0);
        buf=malloc(1024*1024*16);
        for(;;) read(fd, buf, 1024*1024*16);
}

Now this is where it gets interesting. Running one instance
of "badcode" for 15 minutes causes the message
ntpd[2814]: frequency error 512 PPM exceeds tolerance 500 PPM
to appear.

I suspect the problem is somehow related to spl and/or PALcode.
Any ideas?

Regards,
-Tobias