Subject: 2.0 garbaged data - maybe nfs issue?
To: None <tech-kern@netbsd.org>
From: der Mouse <mouse@Rodents.Montreal.QC.CA>
List: tech-kern
Date: 01/24/2005 19:45:15
I recently did a 2.0 install.  To get some other relevant software on
it, I copied source files from the NFS server I got the install sets
off of.  After doing the copies (cd /mnt/local/src; cp -r c-publish
makefiles makewrapper /local/src), I found that a lot of the copied
files contained utter garbage - binary junk bearing no visible
relationship to the source file.

The really weird part is that one of them looked right when examined
with less or vi, but when I threw cc at it, I got errors like

"z.c", line 1:64: warning: null character(s) ignored
"z.c", line 1: error: stray '\360' in program
"z.c", line 1:66: warning: null character(s) ignored
"z.c", line 1: error: syntax error at '@' token
"z.c", line 1:68: warning: null character(s) ignored
"z.c", line 1: error: stray '\266' in program
"z.c", line 1: error: stray '\204' in program
"z.c", line 1:73: warning: null character(s) ignored

(those are actually from a deliberate attempt to compile a binary file,
to check - the actual errors were of the same general flavour: lots of
"stray (character) in program" and "null character(s) ignored".

And the *really* weird part: the line number that this starts happening
at, according to the compiler messages, is just after the last line of
the file according to less/vi/etc.

This sounded reminiscent of recent list mail about the data between a
file's EOF and the end of the page containing the last byte.  Based on
that, I blew the file away entirely with rm and copied via a mechanism
that did not involve NFS at any point.  This copy worked fine.

One person I've written with off-list about this says

> I suspect it happens with slower machines, and most people don't run
> into it.

So it's probably relevant to add: this was on a SPARCstation LX.

mainbus0 (root): SUNW,SPARCstation-LX: hostid 8033ea94
cpu0 at mainbus0: TMS390S10 @ 50 MHz, on-chip FPU
cpu0: physical 4K instruction (32 b/l), 2K data (16 b/l): cache enabled

I haven't tried to replicate any part of the problem; this is mostly a
note in case anyone has any suggestions or ideas.  I can do some
limited amount of testing on this system, if anyone has anything to
suggest.

In passing, I'd also note that something is borked with the sd driver,
even if only just the messages it prints.  My boot and root disk (the
only disk the machine has, in fact) shows up as

sd0 at scsibus1 target 0 lun 0: <IBM, MXVS36D, 0100> disk fixed
sd0: drive offline
sd0: async, 8-bit transfers, tagged queueing

and, well, I'm sorry, but it just loaded the kernel off that drive;
it's really not plausible that it's offline.

In case it matters, the rest of the autoconf tree leading to sd0 is

iommu0 at mainbus0 addr 0x10000000: version 0x1/0x4, page-size 4096, range 64MB
sbus0 at iommu0: clock = 25 MHz
esp1 at sbus0 slot 1 offset 0x8800000 level 3 (ipl 5): FAS366/HME, 40MHz, SCSI ID 7
scsibus1 at esp1: 16 targets, 8 luns per target

I can't very well put the disk on scsibus0 to compare because I've been
totally unable to make that disk work on any narrow bus (and scsibus0
on the LX is narrow - esp1 is part of a combo hme/fas card).

/~\ The ASCII				der Mouse
\ / Ribbon Campaign
 X  Against HTML	       mouse@rodents.montreal.qc.ca
/ \ Email!	     7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B