Subject: Suboptimal VM behavior
To: None <port-sun3@NetBSD.ORG>
From: der Mouse <mouse@Holo.Rodents.Montreal.QC.CA>
List: port-sun3
Date: 01/25/1997 17:13:01
I have a picture-display program I wrote which is capable of using
mmap() on temporary files to get memory backed by filesystem space
rather than swap space (and, though I didn't know it at the time,
partially sidestep the data size limit in the process).

I was using this program heavily today, and observed some rather
disquieting behavior.  Sometimes, when a picture was buried or mostly
buried for a time and then raised, it would have a band of garbage
across it.  This was bad enough...but then, one of my emacs buffers (on
the same machine as the picture displaying program, running under the
same UID, but otherwise unrelated) glitched out, dropping a chunk of
NULs over part of the buffer and copying another part over a third.
Then emacs glitched again, less severely.

Then one of my terminal windows went away, apparently either the shell
or the terminal emulator dying for no visible reason.

Then my ssh-agent process went south; I have a window that's run on a
remote machine over an ssh-forwarded X connection, with agent
forwarding.  I tried to use ssh a la rsh on this remote machine and got
a failure I've never seen before about reading from something->fd;
running "ssh-add -l" on the remote machine (via the forwarding) hung.
Running "ssh-add -l" locally (same agent process, started at login,
before I ran xinit) produced "Broken pipe".

This machine has never done "everything dumps core" in my experience.
Not once, out of all the months I've been running NetBSD/sun3 on it.
However, this is unsettlingly close to "everything dumps core";
_something_ appeared to have been taking more or less random potshots
at user-land VM, and I have an unsubstantiated subjective impression
that only VM that got paged out ever got hit.  It could even be that
only mmap()-backed VM ever got hit, though that is pure speculation.

I cannot reproduce this 100%, but based on my experiences today, I
believe I can provoke it within a few minutes of trying, if anyone has
experiments it might be useful to try.

The system is running approximately 1.2.  I do have some patches, but
certainly nothing that should do _this_.  How does the VM system in
-current compare to that in 1.2?  Is it reasonable to try to push this
machine to -current to fix this?

Here's dmesg output:

Copyright (c) 1982, 1986, 1989, 1991, 1993
	The Regents of the University of California.  All rights reserved.

NetBSD 1.2_BETA (DAILY_PLANET) #0: Wed Dec  4 11:26:46 EST 1996
    mouse@Daily-Planet.Rodents.Montreal.QC.CA:/mouse/sources/working-usr-src/sys/arch/sun3/compile/DAILY_PLANET
Model: Sun 3/260 (hostid 130030f3)
fpu: mc68881
real mem = 25149440
avail mem = 21250048
using 307 buffers containing 2514944 bytes of memory
cache enabled
mainbus0 (root)
obio0 at mainbus0
zsc0 at obio0 addr 0x0 level 6 (softpri 3)
kbd0 at zsc0 channel 0 (console)
ms0 at zsc0 channel 1
zsc1 at obio0 addr 0x20000 level 6 (softpri 3)
zstty0 at zsc1 channel 0
zstty1 at zsc1 channel 1
eeprom0 at obio0 addr 0x40000
clock0 at obio0 addr 0x60000 level 5
memerr0 at obio0 addr 0x80000 (ECC memory)
intreg0 at obio0 addr 0xa0000
ie0 at obio0 addr 0xc0000 level 3 hwaddr 08:00:20:00:e8:a9
obmem0 at mainbus0
bwtwo0 at obmem0 addr 0xff000000 (1152x900)
vmes0 at mainbus0
si0 at vmes0 addr 0xff200000 level 2 vector 0x40 : options=3
scsibus0 at si0
scsi_inqmatch: 2/0/0 <, , >
scsi_inqmatch: 2/0/0 <, , >
sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST12400N, 8650> SCSI2 0/direct fixed
sd0: 2048MB, 2621 cyl, 19 head, 84 sec, 512 bytes/sec x 4194685 sectors
scsi_inqmatch: 26/0/0 <RODIME  , RO3000S         , >
scsi_inqmatch: 2/0/0 <, , >
sd8 at scsibus0 targ 1 lun 0: <RODIME, RO3000S, 2.40> SCSI1 0/direct fixed
sd8: 43MB, 680 cyl, 5 head, 26 sec, 512 bytes/sec x 88400 sectors
si1 at vmes0 addr 0xff204000 level 2 vector 0x41 : options=3
scsibus1 at si1
scsi_inqmatch: 2/0/0 <, , >
scsi_inqmatch: 2/0/0 <, , >
sd1 at scsibus1 targ 0 lun 0: <QUANTUM, LP80S  980809404, 3.3> SCSI2 0/direct fixed
sd1: 80MB, 921 cyl, 4 head, 44 sec, 512 bytes/sec x 164139 sectors
scsi_inqmatch: 2/0/0 <, , >
scsi_inqmatch: 2/0/0 <, , >
sd2 at scsibus1 targ 1 lun 0: <QUANTUM, LP80S  980809404, 3.3> SCSI2 0/direct fixed
sd2: 80MB, 921 cyl, 4 head, 44 sec, 512 bytes/sec x 164139 sectors
scsi_inqmatch: 2/0/0 <, , >
scsi_inqmatch: 2/0/0 <, , >
sd3 at scsibus1 targ 2 lun 0: <QUANTUM, LP80S  980809404, 3.3> SCSI2 0/direct fixed
sd3: 80MB, 921 cyl, 4 head, 44 sec, 512 bytes/sec x 164139 sectors
scsi_inqmatch: 2/0/0 <, , >
scsi_inqmatch: 2/0/0 <, , >
sd4 at scsibus1 targ 3 lun 0: <QUANTUM, LP80S  980809404, 3.3> SCSI2 0/direct fixed
sd4: 80MB, 921 cyl, 4 head, 44 sec, 512 bytes/sec x 164139 sectors
scsi_inqmatch: 2/0/0 <, , >
scsi_inqmatch: 2/0/0 <, , >
sd5 at scsibus1 targ 4 lun 0: <QUANTUM, LP80S  980809404, 3.3> SCSI2 0/direct fixed
sd5: 80MB, 921 cyl, 4 head, 44 sec, 512 bytes/sec x 164139 sectors
scsi_inqmatch: 2/0/0 <, , >
scsi_inqmatch: 2/0/0 <, , >
sd6 at scsibus1 targ 5 lun 0: <QUANTUM, LP80S  980809404, 3.3> SCSI2 0/direct fixed
sd6: 80MB, 921 cyl, 4 head, 44 sec, 512 bytes/sec x 164139 sectors
scsi_inqmatch: 2/0/0 <, , >
scsi_inqmatch: 2/0/0 <, , >
sd7 at scsibus1 targ 6 lun 0: <QUANTUM, LP80S  980809404, 3.3> SCSI2 0/direct fixed
sd7: 80MB, 921 cyl, 4 head, 44 sec, 512 bytes/sec x 164139 sectors
cgtwo0 at vmes0 addr 0xff400000 level 4 vector 0xa8 (1152x900)
vmel0 at mainbus0
root on sd0a
swap on sd0b
dump on sd0b
si_intr: spurious from SBC
isr_vectored: vector=0x41 (not claimed)
si_intr: spurious from SBC
isr_vectored: vector=0x41 (not claimed)
si_intr: spurious from SBC
isr_vectored: vector=0x41 (not claimed)
si_intr: spurious from SBC
isr_vectored: vector=0x41 (not claimed)
si_intr: spurious from SBC
isr_vectored: vector=0x41 (not claimed)
si_intr: spurious from SBC
isr_vectored: vector=0x41 (not claimed)
ie0: TDR detected an open 8192 clocks away

					der Mouse

			       mouse@rodents.montreal.qc.ca
		     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B