Subject: kern/9206: msdosfs behaving badly
To: None <gnats-bugs@gnats.netbsd.org>
From: Paulo Alexandre Pinto Pires <pappires@ppires.org>
List: netbsd-bugs
Date: 01/16/2000 09:32:18
>Number:         9206
>Category:       kern
>Synopsis:       writing on msdosfs partitions fails or/and destroys data
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people (Kernel Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jan 16 09:30:00 2000
>Last-Modified:
>Originator:     Paulo Alexandre Pinto Pires
>Organization:
	COPPE/UFRJ
>Release:        19991116
>Environment:
	
System: NetBSD mateus.ppires.org 1.4O NetBSD 1.4O (MATEUS-19991117) #0: Wed Nov 17 01:50:25 BRST 1999 pappires@mateus.ppires.org:/usr/src/sys/arch/i386/compile/MATEUS-19991117 i386

MS-DOS FAT32 filesystem in a Quantum FIREBALL ST3.2S Ultra SCSI HD,
Adaptec 2940UW.


>Description:
	When writing to msdos filesystem, the system sometimes fails to
	do the requested operation or it writes wrong data to files,
	directories or even to the file allocation tables.  As a result
	of no so prolonged use, one can end up with destroyed file contents,
	or (as happened to me) with a totally messed file system.

	Some more critical and strange failures caused data written to a
	file deep in the directory tree to be appended to MSDOS.SYS.  It
	surely was not a problem with the file system being corrupted
	before running NetBSD, because it happened just after the MS-DOS
	partition had been formatted.

	The problem has shown up to me in different computers, with different
	hard disks and interface types (both SCSI and ATA/IDE), and it seems
	be be around since at least 1.4K.  It may be present in earlier
	stages of -current, but it does not shows up in 1.4 release.

	At first, it looked like some kind of timing problem, or an effect
	of work under high load, so I wrote a program to copy an entire
	directory tree, taking care of calling open(2) with O_SYNC|O_DSYNC
	and sync(2) after each write(2) or mkdir(2).  I also prepared the
	program to retry a failed operation up to five times, and send a
	warning before each retry.  Many operations failed once or twice,
	but executed fine after the retry.  Then I had to abort the program
	because a very crowded sub-directory got created as a file, and
	every operation from that point and descending, consequently, failed.

>How-To-Repeat:
	Try to copy a big directory tree (or extract from a large and complex
	archive) into the msdos volume.  The system will either report that
	some files or directories could not be created, or such files can
	get written wrong or in the wrong place.

>Fix:
	The only work around I could think of is booting from 1.4 ins-
	tallation floppy or CD.  1.4 seems not to be affected by any of these
	symptoms.
>Audit-Trail:
>Unformatted: