Subject: kern/37590: Writing data to a filesystem on an external USB drive fails
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <rillig@NetBSD.org>
List: netbsd-bugs
Date: 12/21/2007 19:40:00
>Number: 37590
>Category: kern
>Synopsis: Writing data to a filesystem on an external USB drive fails
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Dec 21 19:40:00 +0000 2007
>Originator: Roland Illig
>Release: NetBSD 4.99.30
>Organization:
>Environment:
NetBSD bacc.roland-illig.de 4.99.30 NetBSD 4.99.30 (GENERIC) #2: Fri Aug 31 20:40:16 CEST 2007 build@bacc.roland-illig.de:/home/scratch/build/NetBSD/2007-08/work/sys/arch/i386/compile/GENERIC i386
>Description:
I have an external USB disk:
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
NVIDIA nForce MCP55 Memory Controller (RAM memory, revision 0xa1) at pci0 dev 0 function 0 not configured
...
ehci0 at pci0 dev 2 function 1: NVIDIA nForce MCP55 EHCI USB Controller (rev. 0xa2)
APCL: Picked IRQ 21 with weight 0
ehci0: interrupting at ioapic0 pin 21 (irq 10)
ehci0: BIOS has given up ownership
ehci0: EHCI version 1.0
ehci0: companion controller, 10 ports each: ohci0
usb1 at ehci0: USB revision 2.0
uhub1 at usb1
uhub1: NVIDIA EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub1: 10 ports with 10 removable, self powered
...
umass0 at uhub1 port 9 configuration 1 interface 0
umass0: Western Digital External HDD, rev 2.00/1.06, addr 2
umass0: using SCSI over Bulk-Only
scsibus0 at umass0: 2 targets, 1 lun per target
sd0 at scsibus0 target 0 lun 0: <WD, 5000AAK External, 1.06> disk fixed
sd0: 465 GB, 16383 cyl, 16 head, 63 sec, 512 bytes/sect x 976773168 sectors
When I try to write some files on it and read them back, the data of some sectors are changed. Writing directly onto the disk (without filesystems) works.
>How-To-Repeat:
# disklabel sd0 | tail -n 4
# size offset fstype [fsize bsize cpg/sgs]
c: 976773105 63 unused 0 0 # (Cyl. 0*- 969020)
d: 976773168 0 unused 0 0 # (Cyl. 0 - 969020)
e: 976773105 63 4.2BSD 0 0 0 # (Cyl. 0*- 969020)
newfs /dev/rsd0e
mount_ffs /dev/sd0e /mnt/backup
bacc:~ # dd if=/home/scratch/roland/backup/2006-09-02/home-2006-09-02.tar of=/mn t/backup/home2.tar bs=1048576 count=128
128+0 records in
128+0 records out
134217728 bytes transferred in 7.360 secs (18236104 bytes/sec)
bacc:~ # cmp -l /home/scratch/roland/backup/2006-09-02/home-2006-09-02.tar /mnt/ backup/home2.tar | sed 10q
cmp: EOF on /mnt/backup/home2.tar: char 134217729, line 2098791
bacc:~ # dd if=/home/scratch/roland/backup/2006-09-02/home-2006-09-02.tar of=/mn t/backup/home2.tar bs=1048576 count=256
256+0 records in
256+0 records out
268435456 bytes transferred in 15.730 secs (17065191 bytes/sec)
bacc:~ # cmp -l /home/scratch/roland/backup/2006-09-02/home-2006-09-02.tar /mnt/ backup/home2.tar | sed 10q
cmp: EOF on /mnt/backup/home2.tar: char 268435457, line 3274554
bacc:~ # dd if=/home/scratch/roland/backup/2006-09-02/home-2006-09-02.tar of=/mn t/backup/home2.tar bs=1048576 count=512
512+0 records in
512+0 records out
536870912 bytes transferred in 32.433 secs (16553230 bytes/sec)
bacc:~ # cmp -l /home/scratch/roland/backup/2006-09-02/home-2006-09-02.tar /mnt/ backup/home2.tar | sed 10q
909313 65 0
909314 0 57
909315 0 150
909316 0 157
909317 66 155
909318 0 145
909319 0 57
909320 0 162
909321 64 157
909322 0 154
bacc:~ # dd if=/home/scratch/roland/backup/2006-09-02/home-2006-09-02.tar of=/mn t/backup/home2.tar bs=1048576 count=511
511+0 records in
511+0 records out
535822336 bytes transferred in 32.208 secs (16636311 bytes/sec)
bacc:~ # cmp -l /home/scratch/roland/backup/2006-09-02/home-2006-09-02.tar /mnt/ backup/home2.tar | sed 10q
3399681 225 164
3399682 2 141
3399683 0 154
3399684 0 157
3399685 226 147
3399686 2 0
3399687 0 153
3399688 0 144
3399689 37 145
3399690 4 154
>Fix:
I have no idea.
In 2008, I will write a hard disk checker, analogous to memtest, to see whether it is the disk or the filesystem, but I strongly suspect the latter to be the failing cause, since writing directly to /dev/rsd0d works fine.