NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

port-arm/52603: arm(v7?) vfp register corruption



>Number:         52603
>Category:       port-arm
>Synopsis:       arm(v7?) vfp register corruption
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-arm-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Oct 08 16:20:00 +0000 2017
>Originator:     Manuel Bouyer
>Release:        NetBSD 8.0_BETA
>Organization:
>Environment:
System: NetBSD chartplotter 8.0_BETA NetBSD 8.0_BETA (CHARTPLOTTER) #1: Sat Sep 9 13:55:40 CEST 2017 bouyer%bop.soc.lip6.fr@localhost:/dsk/l1/misc/bouyer/tmp/earmv7hf/obj/dsk/l1/misc/bouyer/netbsd-8/src/sys/arch/evbarm/compile/CHARTPLOTTER evbarm
Architecture: earmv7hf
Machine: evbarm
>Description:
	running pkgsrc/geography/opencpn on a olimex lime2, and a cubieboard2
	(both Allinner A20), I got evidence of occasional
	floating-point register corruption (a printf at a strategic point
	shows that a variable computed from other values a few lines before
	has the wrong value).
	I then tried this test program:
cat > fptest.c << EOF
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <err.h>
#include <sys/time.h>

#define NREGS 32

void do_test(int);
void do_wait(int);

double foo = 0;


void
do_wait(int id) {
	struct timeval start, now;
	if (id != 0)
		return;
	// sleep(1);
	gettimeofday(&start, NULL);
	while (1) {
		if (id == 0) {
			foo = foo * 0.1;
		} else {
			int i;
			for (i = 0; i < 1000000000; i++) {
				if (id == 0)
				;
			}
		}
		gettimeofday(&now, NULL);
		if (now.tv_sec - start.tv_sec > 1)
			return;
	}
}

void
do_test(int id)
{
	double *src = malloc(sizeof(double) * NREGS);
	double *dst = malloc(sizeof(double) * NREGS);
	int i;

	printf("start job %d for %d registers\n", id, NREGS);

	for (i = 0; i < NREGS; i++) {
		src[i] = id * 100 + i * 1.1;
	}

	while (1) {
		foo = foo * 0.1;
		__asm __volatile("vldmia %0, {d0-d15}" :: "r" (src) : "memory");
#if NREGS > 16
		__asm __volatile("vldmia %0, {d16-d31}" :: "r" (&src[16]) : "memory");
#endif
		memset(dst, 0, sizeof(double) * NREGS);
		do_wait(id);
		__asm __volatile("vstmia %0, {d0-d15}" :: "r" (dst) : "memory");
#if NREGS > 16
		__asm __volatile("vstmia\t%0, {d16-d31}" :: "r" (&dst[16]) : "memory");
#endif
		if (id == 0)
			continue;
		for (i = 0; i < NREGS; i++) {
			double v = id * 100 + i * 1.1;
			if (dst[i] != v) {
				printf("%d: %lf %lf %lf\n", i, src[i], dst[i], v);
			}
		}
	}
}

int
main(int argc, const char *argv[])
{
	int n;
	int i;

	if (argc != 2) {
		errx(1, "usage: fptest <n>");
	}

	n = atoi(argv[1]);

	for (i = 1; i < n; i++) {
		switch(fork()) {
		case -1:
			err(1, "fork");

		case 0:
			do_test(i);
			exit(0);

		default:
			break;
		}
	}
	do_test(0);
	exit(0);
}
EOF
	compile with
gcc -g -mfpu=neon-vfpv4 -o fptest fptest.c

	running 
./fptest 2
	in parallel with opencpn, after a few hours I got
0: 100.000000 100.000000 28.000000

	and then, less than a day later:
0: 100.000000 11.672723 100.000000
1: 101.100000 16.850291 101.100000
2: 102.200000 6.424029 102.200000
3: 103.300000 16.679222 103.300000
4: 104.400000 255.000000 104.400000
5: 105.500000 255.000000 105.500000
6: 106.600000 255.000000 106.600000
7: 107.700000 0.002048 107.700000
8: 108.800000 0.087582 108.800000
9: 109.900000 0.500000 109.900000

	so we have rare but obvious vfp register corruption.
	I suspect it's related an to interrupt occuring at the wrong
	time, but couldn't track it down more than that.

>How-To-Repeat:
	see above. It you're not running opencpn, you may need to run
	other heavy FP application, or start more than 2 process when
	invoking the test program.
>Fix:
	please ...



Home | Main Index | Thread Index | Old Index