tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

write alignment matters?



I've just run into something (on 4.0.1) which looks to me as though
write() buffer alignment matters.  This sounds to me like a bug, but it
appears to have something to do with the hardware, and I'd appreciate
any thoughts on how I might best track it down.

I tried to run one of my tools (a disk verifier which, as I ran it,
just writes distinctive data to a disk).  I ran it as

disk-check -m sd1d -w /dev/rsd1d

which writes stuff to /dev/rsd1d.  I got messages indicating write
errors.  This seemed weird enough I started investigating further.
ktrace indicated that disk-check was doing

  1019      1 disk-check CALL  open(0xbfbfeb8a,1,0)
  1019      1 disk-check NAMI  "/dev/rsd1d"
  1019      1 disk-check RET   open 3
  1019      1 disk-check CALL  __fstat30(3,0xbfbfe998)
  1019      1 disk-check RET   __fstat30 0
  1019      1 disk-check CALL  ioctl(3,DIOCGDINFO,0xbfbfe804)
  1019      1 disk-check GIO   fd 3 read 404 bytes
[actual data snipped for brevity]
  1019      1 disk-check RET   ioctl 0
  1019      1 disk-check CALL  
__sigaction_sigtramp(SIGINFO,0xbfbfe970,0,0xbbb730c4,2)
  1019      1 disk-check RET   __sigaction_sigtramp 0
  1019      1 disk-check CALL  lseek(3,0,0,0,0)
  1019      1 disk-check RET   lseek 0
  1019      1 disk-check CALL  write(3,0x804baa0,0x400000)
  1019      1 disk-check GIO   fd 3 wrote 0 bytes
       ""
  1019      1 disk-check RET   write 0

which of course isn't right.  I tried with dd, which worked; the only
difference I could see in kdump output that looked even barely possibly
significant was that dd's buffer was aligned to a multiple of 4K.  So I
taught disk-check to align its write buffer and the weird write
behaviour vanished.

Sometimes, depending on I'm not sure what, the write writes 64K of data
(but then an attempt to write 4128768 bytes from a buffer address 64K
further advanced usually returns 0 immediately).

It does not appear to be related to any of my kernel hacks; at least,
when I boot the stock distribution GENERIC kernel, I see the same
(mis)behaviour from my test program (below) as under my kernel.

It does, however, appear to have something to do with the hardware
(personally, I suspect the disk driver); if I try it on another 4.0.1
machine on vnd0d backed by an ordinary file, it doesn't misbehave, and
if I try it on a real disk partition on that machine (but a
non-RAW_PART partition on a wd drive, rather than sd1d), it doesn't
misbehave either.

I wrote a small test program.  I'd be interested to hear if anyone else
can get it to misbehave, or if anyone has any suggestions for what
could possibly be behind this.

The hardware on which this misbehaves is an HP NetServer LP 1000r; the
disk is <HP 73.4G, ATLAS10K3_73_SCA, HP11> in an external enclosure, on
the built-in SCSI interface.  Full dmesg is below.

/~\ The ASCII                             Mouse
\ / Ribbon Campaign
 X  Against HTML                mouse%rodents-montreal.org@localhost
/ \ Email!           7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B

Here's that test program, and the results I see:

# cat z.c
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <strings.h>

#define MISALIGN 2321
#define BUFSIZE (1<<22)
#define ALIGN 4096

static unsigned char dbuf[BUFSIZE+ALIGN+MISALIGN];

int main(void);
int main(void)
{
 unsigned char *dp_align;
 unsigned char *dp_misalign;
 int i;

 for (i=0;i<sizeof(dbuf);i++) dbuf[i] = i & 0xff;
 dp_align = &dbuf[0];
 if ((unsigned long int)dp_align % ALIGN)
  { dp_align += ALIGN - ((unsigned long int)dp_align % ALIGN);
  }
 dp_misalign = dp_align + MISALIGN;
 lseek(1,0,SEEK_SET);
 i = write(1,dp_align,BUFSIZE);
 if (i < 0)
  { fprintf(stderr,"aligned write: error %d (%s)\n",errno,strerror(errno));
  }
 else
  { fprintf(stderr,"aligned write: %d\n",i);
  }
 lseek(1,0,SEEK_SET);
 i = write(1,dp_misalign,BUFSIZE);
 if (i < 0)
  { fprintf(stderr,"misaligned write: error %d (%s)\n",errno,strerror(errno));
  }
 else
  { fprintf(stderr,"misaligned write: %d\n",i);
  }
 return(0);
}
# cc -o z z.c
# ./z > /dev/rsd1d
aligned write: 4194304
misaligned write: 65535
# 

And the dmesg I promised (this is from my kernel, but, as I said, the
stock 4.0.1 GENERIC on this same hardware misbehaves the same way,
though I need to boot -c and "disable acpi" to get it to boot at all):

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
    2006, 2007
    The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
    The Regents of the University of California.  All rights reserved.

NetBSD 4.0.1 (NETSERVER) #0: Mon Jun 20 03:50:52 EDT 2011
        mouse@:/home/mouse/kbuild/NETSERVER
total memory = 1023 MB
rbus: rbus_min_start set to 0x40000000
avail memory = 995 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
BIOS32 rev. 0 found at 0xfd8e2
mainbus0 (root)
mainbus0: Intel MP Specification (Version 1.4) (HP       LP 1Kr/2Kr  )
cpu0 at mainbus0: apid 3 (boot processor)
cpu0: Intel Pentium III (686-class), 1266.79 MHz, id 0x6b1
cpu0: features 383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 383fbff<PGE,MCA,CMOV,PAT,PSE36,MMX>
cpu0: features 383fbff<FXSR,SSE>
cpu0: "Intel(R) Pentium(R) III CPU family      1266MHz"
cpu0: I-cache 16 KB 32B/line 4-way, D-cache 16 KB 32B/line 4-way
cpu0: L2 cache 512 KB 32B/line 8-way
cpu0: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative
cpu0: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way
cpu0: calibrating local timer
cpu0: apic clock running at 133 MHz
cpu0: 16 page colors
cpu1 at mainbus0: apid 0 (application processor)
cpu1: starting
cpu1: Intel Pentium III (686-class), 1266.71 MHz, id 0x6b1
cpu1: features 383fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu1: features 383fbff<PGE,MCA,CMOV,PAT,PSE36,MMX>
cpu1: features 383fbff<FXSR,SSE>
cpu1: "Intel(R) Pentium(R) III CPU family      1266MHz"
cpu1: I-cache 16 KB 32B/line 4-way, D-cache 16 KB 32B/line 4-way
cpu1: L2 cache 512 KB 32B/line 8-way
cpu1: ITLB 32 4 KB entries 4-way, 2 4 MB entries fully associative
cpu1: DTLB 64 4 KB entries 4-way, 8 4 MB entries 4-way
mpbios: bus 0 is type PCI   
mpbios: bus 1 is type PCI   
mpbios: bus 2 is type ISA   
ioapic0 at mainbus0 apid 1 (I/O APIC)
ioapic0: pa 0xfec00000, version 11, 16 pins
ioapic1 at mainbus0 apid 2 (I/O APIC)
ioapic1: pa 0xfec01000, version 11, 16 pins
mpbios: WARNING: pin 17 for apic 2 too high; assuming ACPI global int value
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: ServerWorks CNB20-HE PCI/AGP bridge (rev. 0x06)
pchb1 at pci0 dev 0 function 1
pchb1: ServerWorks CNB20-HE PCI/AGP bridge (rev. 0x06)
pci1 at pchb1 bus 1
pci1: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
esiop0 at pci1 dev 5 function 0: Symbios Logic 53c1010-33 (ultra3-wide scsi)
esiop0: using on-board RAM
esiop0: interrupting at ioapic1 pin 8 (irq 5)
scsibus0 at esiop0: 16 targets, 8 luns per target
esiop1 at pci1 dev 5 function 1: Symbios Logic 53c1010-33 (ultra3-wide scsi)
esiop1: using on-board RAM
esiop1: interrupting at ioapic1 pin 9 (irq 12)
scsibus1 at esiop1: 16 targets, 8 luns per target
fxp0 at pci0 dev 2 function 0: i82559 Ethernet, rev 8
fxp0: interrupting at ioapic1 pin 6 (irq 11)
fxp0: Ethernet address 00:30:6e:11:cc:e7
inphy0 at fxp0 phy 1: i82555 10/100 media interface, rev. 4
inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
vga1 at pci0 dev 7 function 0: ATI Technologies Rage XL (rev. 0x27)
wsdisplay0 at vga1 kbdmux 1: console (80x25, mterm emulation)
wsmux1: connecting to wsdisplay0
fxp1 at pci0 dev 8 function 0: i82559 Ethernet, rev 8
fxp1: interrupting at ioapic1 pin 7 (irq 11)
fxp1: Ethernet address 00:30:6e:11:cc:e8
inphy1 at fxp1 phy 1: i82555 10/100 media interface, rev. 4
inphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcib0 at pci0 dev 15 function 0
pcib0: ServerWorks OSB4 southbridge (rev. 0x50)
rccide0 at pci0 dev 15 function 1
rccide0: ServerWorks OSB4 IDE Controller (rev. 0x00)
rccide0: bus-master DMA support present
rccide0: primary channel configured to compatibility mode
rccide0: primary channel interrupting at ioapic0 pin 14 (irq 14)
atabus0 at rccide0 channel 0
rccide0: secondary channel configured to compatibility mode
rccide0: secondary channel interrupting at ioapic0 pin 15 (irq 15)
atabus1 at rccide0 channel 1
ohci0 at pci0 dev 15 function 2: ServerWorks OSB4/CSB5 USB Host Controller 
(rev. 0x04)
ohci0: interrupting at ioapic1 pin 17 (irq 10)
ohci0: OHCI version 1.0, legacy support
usb0 at ohci0: USB revision 1.0
uhub0 at usb0
uhub0: ServerWorks OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 4 ports with 4 removable, self powered
isa0 at pcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
attimer0 at isa0 port 0x40-0x43: AT Timer
pcppi0 at isa0 port 0x61
pcppi0: children must have an explicit unit
midi0 at pcppi0: PC speaker (CPU-intensive output)
sysbeep0 at pcppi0
isapnp0 at isa0 port 0x279: ISA Plug 'n Play device support
npx0 at isa0 port 0xf0-0xff
npx0: reported by CPUID; using exception 16
fdc0 at isa0 port 0x3f0-0x3f7 irq 6 drq 2
pcppi0: attached to attimer0
isapnp0: no ISA Plug 'n Play devices found
ioapic0: enabling
ioapic1: enabling
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
fd0 at fdc0 drive 0: 1.44MB, 80 cyl, 2 head, 18 sec
Kernelized RAIDframe activated
scsibus0: waiting 2 seconds for devices to settle...
scsibus1: waiting 2 seconds for devices to settle...
sd0 at scsibus0 target 0 lun 0: <SEAGATE, ST39102LCSUN9.0G, 6329> disk fixed
sd0: 8637 MB, 4926 cyl, 27 head, 133 sec, 512 bytes/sect x 17689267 sectors
sd0: sync (25.00ns offset 15), 16-bit (80.000MB/s) transfers, tagged queueing
ses0 at scsibus0 target 11 lun 0: <SDR, GEM318, 0> processor fixed
ses0: SAF-TE Compliant Device
ses0: async, 8-bit transfers
sd1 at scsibus1 target 0 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed
sd1: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors
sd1: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing
sd2 at scsibus1 target 1 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed
sd2: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors
sd2: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing
sd3 at scsibus1 target 2 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed
sd3: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors
sd3: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing
sd4 at scsibus1 target 3 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed
sd4: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors
sd4: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing
sd5 at scsibus1 target 4 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed
sd5: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors
sd5: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing
sd6 at scsibus1 target 5 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed
sd6: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors
sd6: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing
sd7 at scsibus1 target 6 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed
sd7: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors
sd7: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing
sd8 at scsibus1 target 8 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed
sd8: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors
sd8: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing
sd9 at scsibus1 target 9 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed
sd9: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors
sd9: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing
sd10 at scsibus1 target 10 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed
sd10: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors
sd10: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing
sd11 at scsibus1 target 11 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed
sd11: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors
sd11: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing
sd12 at scsibus1 target 12 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP11> disk fixed
sd12: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors
sd12: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing
sd13 at scsibus1 target 13 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP05> disk fixed
sd13: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors
sd13: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing
sd14 at scsibus1 target 14 lun 0: <HP 73.4G, ATLAS10K3_73_SCA, HP05> disk fixed
sd14: 70007 MB, 31022 cyl, 8 head, 577 sec, 512 bytes/sect x 143374738 sectors
sd14: sync (25.00ns offset 31), 16-bit (80.000MB/s) transfers, tagged queueing
ses1 at scsibus1 target 15 lun 0: <HP, A6491A, HP16> enclosure services fixed
ses1: SCSI-3 SES Device
ses1: async, 16-bit transfers
atapibus0 at atabus0: 2 targets
cd0 at atapibus0 drive 0: <CD-224E, , 1.5A> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2
cd0(rccide0:0:0): using PIO mode 4, DMA mode 2 (using DMA)
RF autoconfig: raw component list: .
RF autoconfig: sets: .
RF autoconfig: parts-missing list: .
RF autoconfig: dropping unconfigurable list: .
boot device: sd0
root on sd0a dumps on sd0b
root file system type: ffs
cpu1: CPU 0 running
wsdisplay0: screen 1 added (80x25, mterm emulation)
wsdisplay0: screen 2 added (80x25, mterm emulation)
wsdisplay0: screen 3 added (80x25, mterm emulation)
wsdisplay0: screen 4 added (80x25, mterm emulation)
wsdisplay0: screen 5 added (80x25, mterm emulation)
wsdisplay0: screen 6 added (80x25, vt100 emulation)
wsdisplay0: screen 7 added (80x25, sun emulation)


Home | Main Index | Thread Index | Old Index